libcurl.pbi with UTF-8

Share your advanced PureBasic knowledge/code with the community.
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

libcurl.pbi with UTF-8

Post by Thorium »

I can't find the original post of libcurl.pbi, so i just make a new thread.

libcurl.pbi discards all none ASCII characters by the way it converts the string for libcurl.
Replace the procedure to enable UTF-8 encoding.

Code: Select all

Procedure.s str2curl(string.s)
  
  Protected *curlstring
  Protected PbCurlString.s
  
  *curlstring = AllocateMemory(StringByteLength(string, #PB_UTF8) + 3)
  PokeS(*curlstring,string,-1,#PB_UTF8)
  
  PbCurlString = PeekS(*curlstring,-1)
  FreeMemory(*curlstring)
  
  ProcedureReturn PbCurlString
  
EndProcedure
Edit: Fixed memory leak from original code.
Original code for reference: https://github.com/deseven/pbsamples/bl ... ibcurl.pbi
Last edited by Thorium on Fri Feb 23, 2018 3:01 am, edited 2 times in total.
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: libcurl.pbi with UTF-8

Post by cas »

Hmm... this code is converting unicode string to utf8 and then utf8 back to unicode.... Doesn't make sense to me. If libcurl works with utf8 strings, you can use .p-utf8 to do conversion automatically, or you can use new UTF8() function to simplify code (replaces AllocateMemory() and PokeS()).
Also, this code you posted has memory leak: FreeMemory() will never be called.
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: libcurl.pbi with UTF-8

Post by Thorium »

cas wrote:Hmm... this code is converting unicode string to utf8 and then utf8 back to unicode.... Doesn't make sense to me. If libcurl works with utf8 strings, you can use .p-utf8 to do conversion automatically, or you can use new UTF8() function to simplify code (replaces AllocateMemory() and PokeS()).
Also, this code you posted has memory leak: FreeMemory() will never be called.
Yes it's a strange conversion. I didn't wrote that code, i just modified it for UTF-8. The libcurl.pbi is code not written by me.
I think the idea behind this procedure is to have the string as libcurl expects it but still in a PB string.

It doesnt convert UTF-8 back to unicode. It reads it back as it's written in memory. The string is actually not valid anymore for PB.
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: libcurl.pbi with UTF-8

Post by Thorium »

Thx for mentioning the memory leak, it's true, i didnt see this, but again that code isnt written by me, just adjusted for UTF-8. ^^

I think *curlstring was originally a global variable because he checks if it has a value set and frees the memory, if yes. Probably, for some reason the author changed it to a local variable and forgot to handle to release the memory again.

Here is the original code: https://github.com/deseven/pbsamples/bl ... ibcurl.pbi
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: libcurl.pbi with UTF-8

Post by cas »

Oh, yes. I missed that PeekS() does not have #PB_UTF8 flag. And yes, converting *curlstring to Global fixes memory leak. But it will not be threadsafe. You need to make it Threaded if you have multiple threads calling libcurl:

Code: Select all

Threaded *curlstring
Procedure str2curl(string.s)
  If *curlstring <> 0 : FreeMemory(*curlstring) : EndIf
  *curlstring=UTF8(string.s)
  ProcedureReturn *curlstring
EndProcedure

;example:
curl_easy_setopt(hcurl,#CURLOPT_USERAGENT,str2curl("Mozilla/5.0"))
But memory will be freed only next time you call str2curl() so this is still not an ideal solution.
This is somewhat better (because it does not need FreeMemory()) and it is also threadsafe:

Code: Select all

Procedure.s str2curl(string.s)
  Protected b=StringByteLength(string.s,#PB_UTF8)
  Protected r.s=Space((b+(b%2))/SizeOf(Character))
  PokeS(@r.s,string.s,-1,#PB_UTF8)
  ProcedureReturn r.s
EndProcedure
Or, do it manually without str2curl();

Code: Select all

*agent=UTF8("Mozilla/5.0")
curl_easy_setopt(hcurl,#CURLOPT_USERAGENT,*agent)
FreeMemory(*agent)
But it would be best to use .p-utf8 pseudotype. I see this function inside libcurl.pbi:

Code: Select all

curl_slist_append(slist.i, string.p-utf8)
It is already using .p-utf8 pseudotype. This is good. But, for example, this next function does not use .p-utf8:

Code: Select all

curl_easy_setopt(handle.i, option.i, parameter.i)
and from what i understand, 3rd parameter sometimes expects string pointer and sometimes number (depending on 2nd parameter). This can be solved like this:

Code: Select all

ImportC "libcurl.lib"
  curl_easy_setopt(handle.i, option.i, parameter.i) ;this is already imported in libcurl.pbi
EndImport

;add these 2 lines to libcurl.pbi after EndImport
PrototypeC Proto_curl_easy_setopt(handle.i, option.i, parameter.p-utf8)
Global curl_easy_setopt_str.Proto_curl_easy_setopt=@curl_easy_setopt()
Now you do not need str2curl() anymore. Instead of this:

Code: Select all

Protected agent.s = str2curl("Mozilla/5.0")
curl_easy_setopt(hcurl,#CURLOPT_USERAGENT,@agent)
curl_easy_setopt(hcurl,#CURLOPT_TIMEOUT,40)
you can now write this:

Code: Select all

curl_easy_setopt_str(hcurl,#CURLOPT_USERAGENT,"Mozilla/5.0")
curl_easy_setopt(hcurl,#CURLOPT_TIMEOUT,40)
And you do not need to worry about manually freeing memory. Compiler does that under the hood.
If there are some other similar functions that currently also need str2curl() then you can do the same thing with them as i did now (with PrototypeC and Global variable with appended "_str" or something else).
infratec
Always Here
Always Here
Posts: 6817
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: libcurl.pbi with UTF-8

Post by infratec »

Your idea is good.
Only drawback is that you can not directly convert some C code.
But I think that's a minor point.

Maybe this inside Import looks better:

Code: Select all

curl_easy_setopt_str(handle.i, option.i, parameter.p-utf8) As "_curl_easy_setopt"
Bernd
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: libcurl.pbi with UTF-8

Post by Thorium »

cas wrote:And yes, converting *curlstring to Global fixes memory leak. But it will not be threadsafe.
That wasn't a suggestion to fix it, i actually fixed it right away in the first post after you reported the memory leak.
I was just thinking why it was written this way in the first place, because it's odd to check a local variable for a memory pointer before it was ever used.
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: libcurl.pbi with UTF-8

Post by cas »

infratec wrote:Maybe this inside Import looks better:
I completely forgot about "As" keyword. Your solution is the best solution and also simplest. No need to separately define globals (i hate them) and prototypes.
Post Reply