It is currently Wed Sep 18, 2019 11:06 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: RegularExpressionMatchString() have problems with emojis
PostPosted: Fri Apr 12, 2019 10:42 am 
Offline
Addict
Addict
User avatar

Joined: Wed Apr 30, 2003 8:15 am
Posts: 983
Location: Germany
Using Emojis inside strings witj RegularExpressionMatchString(), will result in wrong (cutted) results.

You can test the code with Emoji here: https://pastebin.com/QhNNR5St
or put your own inside " :-) Replace with Emoji " (due to the Forum crashes with Emojis)

Quote:
regex_SC = CreateRegularExpression(#PB_Any, "^[\t]*[\ ]*EnablePbCgi([\s\S]*?)\(([\s\S]*?)^[\s]*DisablePbCgi", #PB_RegularExpression_MultiLine | #PB_RegularExpression_NoCase)

content.s = "EnablePbCgi" + #CRLF$ +
~"()\" :-) Replace with Emoji \"" + #CRLF$ +
"DisablePbCgi"

; Debug StringByteLength(content)
; ShowMemoryViewer(@content, StringByteLength(content))
; CallDebugger

ExamineRegularExpression(regex_SC, content)

If NextRegularExpressionMatch(regex_SC)
Debug RegularExpressionMatchString(regex_SC) ; Last Char ist missing
EndIf

_________________
"Daddy, I'll run faster, then it is not so far..."


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpressionMatchString() have problems with emojis
PostPosted: Fri Apr 12, 2019 12:36 pm 
Online
Enthusiast
Enthusiast
User avatar

Joined: Sun Jun 22, 2003 7:43 pm
Posts: 426
Location: Germany, Saarbrücken
I guess the issue here is that Purebasic uses UTF16 (Unicode) which means that every character is stored using 16 bits. Most Emojis use codes with a higher number than 16 Bit are able to store.
Or in other words: You can not map all possible UTF-8 characters in UTF-16. You can write Emojis in your Purebasic code because the file uses UTF-8 but after compiling the Emoji will be mapped to Unicode or there will be a compiler error.

_________________
Electronics, Crazy & Interesting Stuff, all that with text, image and sound? Click here!

The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpressionMatchString() have problems with emojis
PostPosted: Fri Apr 12, 2019 12:47 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Jan 10, 2008 1:30 pm
Posts: 1213
Location: Germany, Glienicke
But the special character is correctly stored as surrogate:
Quote:
3D D8 12 DD

Which means, that probably the PCRE-lib reads it as 1 character and returns a less (but currect) length, but for pure basic it is 2 characters, and lost the last character at the end.

_________________
ImageImage


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye