Code: Select all
string$ = "Mélissa"
Debug PeekS(@string$,StringByteLength(string$),#PB_UTF8)
But should PureBasic really just cut it there?
Oh btw:
http://en.wikipedia.org/wiki/UTF-8#Inva ... _sequencesRFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."[7] The Unicode Standard requires decoders to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence." Many UTF-8 decoders throw an exception if a string has an error in it. In recent times this has been found to be impractical: being unable to work with data means you cannot even try to fix it. One example was Python 3.0 which would exit immediately if the command line had invalid UTF-8 in it.[8] A more useful solution is to translate the first byte to a replacement and continue parsing with the next byte.
So "a more useful solution is to translate the first byte to a replacement and continue parsing with the next byte" and I agree. Well, at least that or an error message?