UTF8 Question

Mac OSX specific forum
simberr
User
User
Posts: 46
Joined: Sun Jun 10, 2018 12:54 pm

UTF8 Question

Post by simberr »

Does UTF8 return a pointer to an already allocated string or should I set the memory size prior to the call?

In other words, can I use

Code: Select all

  s.s = "This is a test string." + #CRLF$ + "œ∑´®†¥¨^øππ“‘å∂ƒ©˙∆˚¬…æ«"
  *m =UTF8(s)
  ProcedureReturn *m
  
Or, should I use:

Code: Select all

  s.s = "This is a test string." + #CRLF$ + "œ∑´®†¥¨^øππ“‘å∂ƒ©˙∆˚¬…æ«"
  *m = AllocateMemory(StringByteLength(s), #PB_UTF8)
  *m = UTF8(s)
  ProcedureReturn *m
  
Also, I cannot see how I can use

Code: Select all

FreeMemory(*m)
as it advises within the manual.

Some advice would be helpful, thank you.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UTF8 Question

Post by wilbert »

You should use your first code.
The second one will result into a memory leak.
As the manual mentions, when you don't need it anymore, you need to free it.

Code: Select all

Procedure.i LCaseUTF8(s.s)
  ProcedureReturn UTF8(LCase(s))
EndProcedure

*s = LCaseUTF8("This will become UTF8")
Debug PeekS(*s, -1, #PB_UTF8)
FreeMemory(*s)
Windows (x64)
Raspberry Pi OS (Arm64)
simberr
User
User
Posts: 46
Joined: Sun Jun 10, 2018 12:54 pm

Re: UTF8 Question

Post by simberr »

Wilbert

Thank you, it was as I thought.

My issue with the second question regarding FreeMemory is that the actual procedure is part of a DLL routine. I show it below:

Code: Select all

; Send a String to Program
ProcedureCDLL.i GetString()
  
  s.s = "This is a test string." + #CRLF$ + "œ∑´®†¥¨^øππ“‘å∂ƒ©˙∆˚¬…æ«"
  *m = UTF8(s)
  ProcedureReturn *m
  
EndProcedure
I am a little worried that I will generate a memory leak from this over time as I am never freeing the *m. Do you know how I should do this?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UTF8 Question

Post by wilbert »

Here’s a similar question.
Maybe that helps
http://forums.purebasic.com/english/vie ... 13&t=68761
Windows (x64)
Raspberry Pi OS (Arm64)
simberr
User
User
Posts: 46
Joined: Sun Jun 10, 2018 12:54 pm

Re: UTF8 Question

Post by simberr »

Wilbert

Thank you, again, for your help.

I have now created a new procedure that I can call from my main program to clear the memory allocation from the *m = UTF8(s) statements. I have made *m a global variable.

Everything seems to work but, in my testing, I have found that if I call the routines about three times it crashes my main app. I don't get any indication of where that is happening.

This is not a DLL per se, it is a dylib as I run on a Mac (and in 64-bit mode). I have the following procedures in the dylib:

Code: Select all

; Clear the *ret memory allocation
ProcedureCDLL ClearMemory()

; Send an Integer from Program
ProcedureCDLL PutInteger(i.i)

; Send an Integer to Program
ProcedureCDLL.i GetInteger()

; Send a Double from Program
ProcedureCDLL PutDouble(d.d)

; Send a Double to Program
ProcedureCDLL.d GetDouble()

; Send a String from Program
ProcedureCDLL PutString(s.s)

; Send a String to Program
ProcedureCDLL.i GetString()

; Send all three
ProcedureCDLL PutAll(i.i, d.d, s.s)

; Get all three
ProcedureCDLL.i GetAll()
You can see that this is just a test dylib so I can test both input and output from my calling app. First time through all runs well. Same with second time but after three or so iterations my main app just crashes and I cannot see where.

Do you think it is a memory problem, the way in which I have constructed the dylib or what?

I am at a total loss at the moment. I really need PB as a backend dylib compiler and it seems just perfect for my needs but I cannot have a dylib that works once then crashes the app!

Simon.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UTF8 Question

Post by wilbert »

Most likely it’s a memory problem caused by your code but it’s hard to tell without seeing the code.
Can you post the code or at least that of the string procedures and the procedure to free the memory ?
Windows (x64)
Raspberry Pi OS (Arm64)
simberr
User
User
Posts: 46
Joined: Sun Jun 10, 2018 12:54 pm

Re: UTF8 Question

Post by simberr »

Sure.

Code (whole dylib):

Code: Select all

; dllTest.pb
; Testing dll calls.

; Global variables
Global *ret

; Include files
XIncludeFile "../Includes/LogFile.pb"

; Local Macros
Macro ConvUTF8(Mem, Type = #PB_UTF8)
  PeekS(Mem, -1, Type)
EndMacro

Macro ConvBack(Mem, Type = #PB_Ascii)
  PeekS(Mem, -1, Type)
EndMacro

; Local Procedures
Procedure.s ConvertedInput(st.s)

  *m = AllocateMemory(StringByteLength(st))
  PokeS(*m, st)
  ms.s = ConvUTF8(*m)
  FreeMemory(*m)
  ProcedureReturn ms

EndProcedure

Procedure.i ConvertToUTF8Memory(s.s)
  
  *m = UTF8(s)
  ProcedureReturn *m
  
EndProcedure

; DLL Routines
; Clear the *ret memory allocation
ProcedureCDLL ClearMemory()
  
  If *ret
    FreeMemory(*ret)
  EndIf
  
EndProcedure


; Send an Integer from Program
ProcedureCDLL PutInteger(i.i)
  
  AddLog("PutInteger: " + Str(i), #AL_NEW)
  
EndProcedure

; Send an Integer to Program
ProcedureCDLL.i GetInteger()
  
  i.i = 102456
  AddLog("GetInteger: " + Str(i))
  ProcedureReturn i
  
EndProcedure

; Send a Double from Program
ProcedureCDLL PutDouble(d.d)
  
  AddLog("PutDouble: " + StrD(d))
  
EndProcedure

; Send a Double to Program
ProcedureCDLL.d GetDouble()
  
  d.d = 102456.78901
  AddLog("GetDouble: " + StrD(d))
  ProcedureReturn d
  
EndProcedure

; Send a String from Program
ProcedureCDLL PutString(s.s)
  
  m.s = ConvertedInput(s)
  AddLog("PutString: " + m)
  
EndProcedure

; Send a String to Program
ProcedureCDLL.i GetString()
  
  ClearMemory()
  s.s = "This is a test string." + #CRLF$ + "œ∑´®†¥¨^øππ“‘å∂ƒ©˙∆˚¬…æ«"
  AddLog("GetString: " + s)
  *ret = ConvertToUTF8Memory(s)
  ProcedureReturn *ret
  
EndProcedure

; Send all three
ProcedureCDLL PutAll(i.i, d.d, s.s)
  
  ins.s = ConvertedInput(s)
  ins = UCase(ins)
  AddLog("Allthree: Integer(" + Str(i) + ") Double(" + StrD(d) + ") String(" + ins + ")")
  
EndProcedure

ProcedureCDLL.i GetAll()
  
  ClearMemory()
  i.i = 194536
  d.d = 10324.5673434
  s.s = "This is my test string. Good luck with this..."
  msg.s = Str(i) + "|" + StrD(d) + "|" + s
  *ret = ConvertToUTF8Memory(msg)
  ProcedureReturn *ret
  
EndProcedure

simberr
User
User
Posts: 46
Joined: Sun Jun 10, 2018 12:54 pm

Re: UTF8 Question

Post by simberr »

Just realised you need the AddLog code too:

Code: Select all

; LogFile.pb
; Automatic logging file.

; Local variables
#AL_NEW = 1
#AL_APPEND = 2

; Procedure
Procedure AddLog(msg.s, mode.i = #AL_APPEND)

  ;   fname.s = GetUserDirectory(#PB_Directory_Documents) + "/Logfile.log"
  dt.s = FormatDate("%yyyy-%mm-%dd %hh:%ii:%ss ", Date())
  fname.s = "/Users/simberr/Downloads/Logfile.log" ; hard-coded file name for debugging purposes
  
  Select mode
    Case #AL_APPEND
      lfile.i = OpenFile(#PB_Any, fName, #PB_File_Append)
    Case #AL_NEW
      DeleteFile(fname)
      lfile.i = OpenFile(#PB_Any, fName, #PB_File_SharedWrite)
  EndSelect
  
  If lfile
    msg = dt + msg + #CRLF$
    WriteString(lfile, msg)
    CloseFile(lfile)
  EndIf

EndProcedure
simberr
User
User
Posts: 46
Joined: Sun Jun 10, 2018 12:54 pm

Re: UTF8 Question [SOLVED]

Post by simberr »

It seems that I have found the issue.

Making the pointer to the string (Global *ret) and clearing it by using FreeMemory just prior to setting it for return from the DLL procedure has solved the issue.

I had to clear the ClearMemory() function from the DLL. I guess that when my calling program called the ClearMemory() procedure it had already cleared the memory so the error was mine.

thank you, Wilbert, for your help. You certainly pointed me in the right direction. It seems it is now resolved.

Simon.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UTF8 Question

Post by wilbert »

I'm not sure if I understood your last post but your initial approach was indeed wrong.

You either need to have a procedure like

Code: Select all

ProcedureCDLL FreeString(*s)
  If *s
    FreeMemory(*s)
  EndIf
EndProcedure
where you pass the pointer for each string your dylib returned or you handle the memory within the dylib by freeing the previous utf8 string just before a new one is created.
In this case the program which is using the dylib needs to make a copy of the string so it's no problem if the dll frees the memory.
Windows (x64)
Raspberry Pi OS (Arm64)
simberr
User
User
Posts: 46
Joined: Sun Jun 10, 2018 12:54 pm

Re: UTF8 Question

Post by simberr »

Wilbert

Thank you, the problem was solved. I now free the memory before allocating a new value to it. I think this was the issue as I now do not have any problems in calling the routines many times and the main app stays stable.

I am beginning to get to grips with the memory allocation issues outside of the closed app and dylib environments. I still think there is some way for me to go yet, though!

Thanks for all your help.
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: UTF8 Question

Post by #NULL »

Code: Select all

; Local Macros
Macro ConvUTF8(Mem, Type = #PB_UTF8)
  PeekS(Mem, -1, Type)
EndMacro

Macro ConvBack(Mem, Type = #PB_Ascii)
  PeekS(Mem, -1, Type)
EndMacro

; Local Procedures
Procedure.s ConvertedInput(st.s)

  *m = AllocateMemory(StringByteLength(st))
  PokeS(*m, st)
  ms.s = ConvUTF8(*m)
  FreeMemory(*m)
  ProcedureReturn ms

EndProcedure
Are you sure the macros do what you think they are doing? They both read from an utf-8 or Ascii buffer respectively, and return both a utf-16 string.
ConvertedInput() gets an utf-16 string, copies that into new memory and then wants to read from that buffer as if it was utf-8 encoded, creating again an utf-16 buffer from that and returns it.
But I can be wrong since i'm not very familiar with encodings in PB.
simberr
User
User
Posts: 46
Joined: Sun Jun 10, 2018 12:54 pm

Re: UTF8 Question

Post by simberr »

This was the issue that I was having as well, not really understanding what was happening within PB and what encoding was being used.

I don't actually use the ConvBack macro at all, that was a bit of legacy code.

However, your comments are pertinent to the issue that I had. My calling app uses UTF8 and PB uses Unicode (or, as I believe, UTF16). I can tell you, though, that the code posted does work fine and I am getting no errors or crashes at all.

I set up a logging program in both the calling app and the PB dylib. In both I logged the sent string from the app and the received string to the PB dylib. I logged the before and after versions of the PB dylib received string and saw that the after was identical to the string sent from the calling app. So I know that the ConvUTF8 macro works as I expected it to. After manipulating that string within the dylib I use a conversion back to UTF8 and return a UTF8() pointer back to my calling app. My calling app receives that as a C style string that is easily converted to my calling app's UTF8 string representation. All logging to and from the two programs confirms the string data is identical.

It took a long time to work that out, but I can confirm that this is now working well for me.

Thanks for your input. If I get problems again I will revisit this thread!
Post Reply