Page 1 of 1

[4.30] URLDecoder decodes UTF8 chars as ASCII chars

Posted: Mon Dec 22, 2008 8:02 pm
by mback2k
Hello everyone,

I would like to report the following bug:

Currently URLDecoder does not support to decode UTF8 %-encoded URLs and URLEncoder can't encode to 2-byte %-encoded URLs.

Example of the current correct situation:

Code: Select all

URLEncoder("Ä") = %C4 
URLDecoder("%C4") = Ä
Example of whats missing:

Code: Select all

URLEncoder("Ä") = %C3%84
URLDecoder("%C3%84") = Ä
Basically URLEncoder needs to have an additional parameter, e.g.:

Code: Select all

URLEncoder("Ä") = %C4
URLEncoder("Ä", #PB_Ascii) = %C4
URLEncoder("Ä", #PB_UTF8) = %C3%84
And URLDecoder needs to automatically decide weither the character is in ASCII or UTF8 format:

Code: Select all

URLDecoder("%C4") = Ä
URLDecoder("%C3%84") = Ä
This can be done by checking if the first %xx value is out of ASCII range by being greater than 127. If thats the case, %xx%xx represents one character and not two!

Maybe URLDecoder should also get an additional parameter to allow the decoding format being forced.

I hope this is understandable, more information can be found here:
http://purebasic.fr/english/viewtopic.php?t=35747

For testing run the following code in ASCII or Unicode mode:

Code: Select all

Debug URLDecoder("%C4")
Debug URLDecoder("%C3%84")
The 2nd line returns a string with 2 characters instead of 1.

Thanks in advance!