infratec wrote: Sat Jun 21, 2025 8:49 am
Olli wrote: Fri Jun 20, 2025 7:59 pm
Code: Select all
Debug PeekS(@"abc123", -1, #PB_Unicode)
Result (Windows) :
Since a long time PB uses unicode only to store internal strings.
So your example will always show abc123 as result.
You are exactly right my friend : in 2015 (near version 5.40 pureBasic).
But, simultaneously (or maybe not), we were provided by a complete converting string instructions set. What it allowed us to hold perfectly the breakthrough.
In PureBasic, the compiler consider a string is
unicode.
So :
is (by default)"
unicoded ".
You advice to me to observe UTF8 in "the" code.
But... In which code ? The source code openssl ? (actually stored in github)
If yes, this means "abc123" [UTF8] has not the same GUID as "abc123" [unicode]. (here "GUID" is the global numeric code of the string, whatever the string, considered between its first character included, and its last character included.
Note that a
unicode character endianness is in the big endian mode. This mean 'A' is stored as 0x8100h. But, with all native PureBasic statements, this is a detail. To convert a string from a pureBasic source code (by default
unicode ) to a UTF8 system, we can use the same code just above, with a light difference :
Code: Select all
Debug PeekS(@"abc123", -1, #PB_UTF8)
And normally, the result is not "abc123", but a shorter overloaded and ununderstandable string.
This should be a good way. But, if I had to understand to the domain, I suggest to be insured the code pages are the right ones.
Because :
1) ASCII had its own code pages
2) UTF8 had its own code pages
3) Unicode has its own code pages
Fred chose a standard unicode code page. And I suppose the author of
openssl has also choosen a specific code page, linked to the hardward and software of the environment coding.
My suggest : never use a string in cryptography, even not for a password.
a)Because what it is a force in UNICODE and ASCII, is a fail in UTF8.
b)What it is a force in ASCII and UTF8, is a fail in UNICODE.
c)What it is a force in UTF8 and UNICODE is a not a fail in ASCII : it is all simply unabled in ASCII...
Prefer a variable sized GUID and a good set of symbols (a code page) to be clicked or tapped, else you will be in the mercy of a fail, early or late...
I started to code in ASCII. It is not open to the world. But I do not better way. I can absolutely be wrong, but if it existed a better way as a ASCII non zero terminal string (which uses 100% of the numeric domain, not 0.01% as UNICODE) as a readable key, the COBOL would be six feet under the ground...
So
PokeS() converts from standard Unicode "abc123" string to a "utf8" buffer.
Then
FingerPrint() should provide a md5 signature.
It stays a standard I have not read in github : which character is used to store the useless terminal characters (0 or 32 ?).