CP1252 > CP1251 (?)

Just starting out? Need help? Post your questions and find answers here.
AZJIO
Addict
Addict
Posts: 2154
Joined: Sun May 14, 2017 1:48 am

CP1252 > CP1251 (?)

Post by AZJIO »

I generated data using this code

Code: Select all

EnableExplicit
Define s$ = "ЂЃ‚ѓ„…†‡€‰Љ‹ЊЌЋЏђ‘’“”•–—˜™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬­®Ї°±Ііґµ¶·ё№є»јЅѕїАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя"
Define s1$ = #TAB$ + "Data.i ", z, i
For i = 0 To 127
	s1$ + Str(Asc(Mid(s$, i, 1))) + ", "
	z + 1
	If z > 10
		z = 0
		s1$ = Left(s1$, Len(s1$) - 2) + #CRLF$ + #TAB$ + "Data.i "
; 		If i = 127
; 			s1$ = Left(s1$, Len(s1$) - 8)
; 		EndIf
	EndIf
Next
Debug Left(s1$, Len(s1$) - 2)
I'm trying to convert the encoding, but it's not working yet. What am I doing wrong?
I took the code originally here.

Code: Select all

EnableExplicit

Procedure ToCP1251(*s.Unicode)
	Protected i
	Protected *ptr.Unicode

	While *s\u
		If *s\u > 127 And *s\u < 256
			*ptr = ?CP1251 + *s\u - 128
; 			Debug ?CP1251
; 			Debug *s\u - 128
; 			Debug *ptr\u
			*s\u = *ptr\u
		EndIf
		*s + SizeOf(Unicode)
	Wend
EndProcedure


Define i, Text$

For i = 224 To 255
	Text$ + Chr(i)
Next
For i = 192 To 223
	Text$ + Chr(i)
Next
; 224 - 128 = 96
Debug Text$ ; 1252
ToCP1251(@Text$)
Debug Text$ ; 1251

DataSection
	CP1251:
	Data.i 1026, 1026, 1027, 8218, 1107, 8222, 8230, 8224, 8225, 8364, 8240
	Data.i 1033, 8249, 1034, 1036, 1035, 1039, 1106, 8216, 8217, 8220, 8221
	Data.i 8226, 8211, 8212, 152, 8482, 1113, 8250, 1114, 1116, 1115, 1119
	Data.i 160, 1038, 1118, 1032, 164, 1168, 166, 167, 1025, 169, 1028
	Data.i 171, 172, 173, 174, 1031, 176, 177, 1030, 1110, 1169, 181
	Data.i 182, 183, 1105, 8470, 1108, 187, 1112, 1029, 1109, 1111, 1040
	Data.i 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051
	Data.i 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062
	Data.i 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073
	Data.i 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084
	Data.i 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095
	Data.i 1096, 1097, 1098, 1099, 1100, 1101, 1102
EndDataSection
I also have a question, is it possible to take data from a Windows file so that it is not tied to data inside the code, that is, a universal way to take the current OS encoding
infratec
Always Here
Always Here
Posts: 7598
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: CP1252 > CP1251 (?)

Post by infratec »

You made several faults.

1. It needs to be .u and not .i
2. Mid starts with 1 and not 0

3. The code to pick the characters from the data section needs a multiply by 2

Code: Select all

EnableExplicit

Define s$ = "ЂЃ‚ѓ„…†‡€‰Љ‹ЊЌЋЏђ‘’“”•–—˜™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬­®Ї°±Ііґµ¶·ё№є»јЅѕїАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя"
Define s1$ = #TAB$ + "Data.u ", z, i

For i = 1 To 128
  Debug Mid(s$, i, 1)
  Debug Asc(Mid(s$, i, 1))
	s1$ + Str(Asc(Mid(s$, i, 1))) + ", "
	z + 1
	If z > 10
		z = 0
		s1$ = Left(s1$, Len(s1$) - 2) + #CRLF$ + #TAB$ + "Data.u "
; 		If i = 127
; 			s1$ = Left(s1$, Len(s1$) - 8)
; 		EndIf
	EndIf
Next
Debug Left(s1$, Len(s1$) - 2)
And

Code: Select all

EnableExplicit

Procedure ToCP1251(*s.Unicode)
	Protected i
	Protected *ptr.Unicode

	While *s\u
		If *s\u > 127 And *s\u < 256
			*ptr = ?CP1251 + (*s\u - 128) *2
; 			Debug *s\u - 128
; 			Debug *ptr\u
			*s\u = *ptr\u
		EndIf
		*s + SizeOf(Unicode)
	Wend
EndProcedure


Define i, Text$

For i = 224 To 255
	Text$ + Chr(i)
Next
For i = 192 To 223
	Text$ + Chr(i)
Next
; 224 - 128 = 96
Debug Text$ ; 1252
ToCP1251(@Text$)
Debug Text$ ; 1251

DataSection
	CP1251:
	Data.u 1026, 1027, 8218, 1107, 8222, 8230, 8224, 8225, 8364, 8240, 1033
	Data.u 8249, 1034, 1036, 1035, 1039, 1106, 8216, 8217, 8220, 8221, 8226
	Data.u 8211, 8212, 152, 8482, 1113, 8250, 1114, 1116, 1115, 1119, 32
	Data.u 1038, 1118, 1032, 164, 1168, 166, 167, 1025, 169, 1028, 171
	Data.u 172, 173, 174, 1031, 176, 177, 1030, 1110, 1169, 181, 182
	Data.u 183, 1105, 8470, 1108, 187, 1112, 1029, 1109, 1111, 1040, 1041
	Data.u 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052
	Data.u 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063
	Data.u 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074
	Data.u 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085
	Data.u 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096
	Data.u 1097, 1098, 1099, 1100, 1101, 1102, 1103
EndDataSection
And CP1251 is the wrong name isn't it?
AZJIO
Addict
Addict
Posts: 2154
Joined: Sun May 14, 2017 1:48 am

Re: CP1252 > CP1251 (?)

Post by AZJIO »

To save the file, I need to do the reverse conversion. When I did the check, the code in my HTML file got corrupted. To be precise, the space (32) has been replaced by an unbroken space (160). I have limited the range to 161-255. Now I do not know how much this is correct, because if characters in the range 128-160 are used in the file, and they are also used in the search and replace string, then there may be problems. The only reassuring thing is that I don't use characters 128-160 in 99% of my files. I could make two ranges 128-159 and 161-255. Maybe my algorithm is wrong.

I found a mistake. In the CP1251 label: there is a symbol 32, but it should not be there, judging by the position there should be 160.
Also, some symbols do not require replacement and can be removed from the map.

Code: Select all

; https://www.purebasic.fr/english/viewtopic.php?t=83470
EnableExplicit

Procedure ToCP1251(*s.Unicode)
	Protected i
	Protected *ptr.Unicode

	While *s\u
		If *s\u > 127 And *s\u < 256
			*ptr = ?CP1251 + (*s\u - 128) * 2
			*s\u = *ptr\u
		EndIf
		*s + SizeOf(Unicode)
	Wend
EndProcedure

Global NewMap cp1251()

Procedure CreateArrCP1251ToUTF()
	Protected i
	Protected *ptr.Unicode
	
	For i = 128 To 255
		*ptr = ?CP1251 + (i - 128) * 2
		AddMapElement(cp1251(), Chr(*ptr\u))
		cp1251() = i
	Next
	Debug MapSize(cp1251())
; 	исключаем символы, так как они являются сами собой и не требуют подмены
	DeleteMapElement(cp1251(), Chr(152))
	DeleteMapElement(cp1251(), Chr(160))
	DeleteMapElement(cp1251(), Chr(164))
	DeleteMapElement(cp1251(), Chr(166))
	DeleteMapElement(cp1251(), Chr(167))
	DeleteMapElement(cp1251(), Chr(169))
	DeleteMapElement(cp1251(), Chr(171))
	DeleteMapElement(cp1251(), Chr(172))
	DeleteMapElement(cp1251(), Chr(173))
	DeleteMapElement(cp1251(), Chr(176))
	DeleteMapElement(cp1251(), Chr(177))
	DeleteMapElement(cp1251(), Chr(181))
	DeleteMapElement(cp1251(), Chr(182))
	DeleteMapElement(cp1251(), Chr(183))
	DeleteMapElement(cp1251(), Chr(187))
	Debug MapSize(cp1251())
EndProcedure

Procedure ToCP1252(*s.Unicode)
	Protected i
; 	Protected *s.Unicode
	Protected *ptr.Unicode

	While *s\u
		If FindMapElement(cp1251(), Chr(*s\u))
; 			*ptr = ?CP1251 + (*s\u - 128) * 2
			*s\u = cp1251()
		EndIf
		*s + SizeOf(Unicode)
	Wend
EndProcedure


Define i, Text$

For i = 128 To 255
	Text$ + Chr(i)
Next

Debug Text$ ; 1252
ToCP1251(@Text$)
Debug Text$ ; 1251
CreateArrCP1251ToUTF()

ToCP1252(@Text$)
Debug Text$ ; 1252


DataSection
	CP1251:
	Data.u 1026, 1027, 8218, 1107, 8222, 8230, 8224, 8225, 8364, 8240, 1033
	Data.u 8249, 1034, 1036, 1035, 1039, 1106, 8216, 8217, 8220, 8221, 8226
	Data.u 8211, 8212, 152, 8482, 1113, 8250, 1114, 1116, 1115, 1119, 160
	Data.u 1038, 1118, 1032, 164, 1168, 166, 167, 1025, 169, 1028, 171
	Data.u 172, 173, 174, 1031, 176, 177, 1030, 1110, 1169, 181, 182
	Data.u 183, 1105, 8470, 1108, 187, 1112, 1029, 1109, 1111, 1040, 1041
	Data.u 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052
	Data.u 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063
	Data.u 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074
	Data.u 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085
	Data.u 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096
	Data.u 1097, 1098, 1099, 1100, 1101, 1102, 1103
EndDataSection
Post Reply