Unicode PB Functions

Share your advanced PureBasic knowledge/code with the community.
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Unicode PB Functions

Post by Xombie »

I'm moving to wide string functionality on my main project so that I won't have a nightmare trying to convert it later. Unfortunately, this gave me problems since PB doesn't have any native wide string (unicode) functions. So, I made some. Quite a few. I even threw in a little test app so you can try stuff out. The test app in itself is pretty nifty. I'll talk about that first.

So the test app... Similar to other unicode stuff I posted previously this is a way to create a unicode window and a couple of unicode 'gadgets'. However, it completely does away with standard PB functions. Following MS I used W to signify a wide character function. Yes, I know people would hate that but it's just a name and can be changed or ignored. For an example, take a look at this code bit.

Code: Select all

;- Enumeration
Enumeration
   #formMain
EndEnumeration
Enumeration
   #ButtonClose
   #StringTest
   #TextTest
   #ComboTest
EndEnumeration
;- Constants
;- Includes
XIncludeFile "Unicode.pb"
;- Form Code
UnicodeInit("Arial Unicode MS", 8, 0) ; Style uses the #PB_Font_Bold, etc... styles.
; Initialize our unicode functions with out default font.
;- Main program
; HoldString.l = AnsiToUnicode("Test")
HoldString.l = AllocateMemory(6)
PokeW(HoldString, $3070)
PokeW(HoldString + 2, $304B)
PokeW(HoldString + 4, 0)
; Test a Japanese window title.
HandleWindow.l = OpenWindowW(#formMain, 216, 0, 450, 150, #PB_Window_SystemMenu | #PB_Window_SizeGadget | #PB_Window_TitleBar, HoldString, 0)
;
If HandleWindow
   ;
   If CreateGadgetListW(HandleWindow)
      ;
      AdvancedGadgetEventsW(#True)
      ; Turn on advanced events for our unicode button gadget.
      ButtonGadgetW(#ButtonClose, 0, 0, 100, 20, HoldString, 0)
      ;
      TextGadgetW(#TextTest, 0, 21, 200, 20, HoldString, #PB_Text_Border | #PB_Text_Center)
      ;
      StringGadgetW(#StringTest, 0, 41, 200, 20, HoldString, 0)
      ;
   EndIf
   ;
   FreeMemory(HoldString)
   ;
   doQuit.b
   EventID.l
   ;
   Repeat
      ;
      EventID = WaitWindowEventW(HandleWindow)
      ;
      If EventID = #PB_Event_CloseWindow : doQuit = #True : EndIf 
      ;
      If EventID = #PB_Event_Gadget
         ;
         If EventGadgetIDW() = #ButtonClose
            ;
            If EventTypeW() = #PB_EventType_LeftClick
               Debug "Left Click"
            EndIf
            ;
         ElseIf EventGadgetIDW() = #StringTest
            ;
            If EventTypeW() = #PB_EventType_Change
               Debug "Changing"
            EndIf
            ;
         ElseIf EventGadgetIDW() = #TextTest
            ;
            If EventTypeW() = #PB_EventType_LeftClick
               Debug "Text Left Click"
            EndIf
            ;
         EndIf
         ;
      EndIf
      ;
   Until doQuit = #True
   ;
EndIf
;
UnicodeDestroy()
; Destroy our unicode gadgets.
End
Does it look familiar? :D I know Fred plans unicode support later on but I wanted to learn some things so I fiddled with making a 'Unicode PB' of sorts. It was actually kind of fun to do and the Window/Gadget code has a few neat tricks in it if you dig through them. For example, to enumerate the gadgets/windows, I create a custom heap and walk through the heap to locate the required gadget or window. Feel free to browse the unicode gadget/window code and see how all of that works. The test program is 'Unicode-Test.pb' in the included rar file.

However, that's not the main intention of the post. The main intention is just to have unicode functions available for wide format string manipulation. So here's what's included in 'Unicode.pb'.

A wide character string builder. This way you can start the string builder and quickly add strings for whatever task you need. For insert strings it has - sbAdd() to add a wide character string, sbAddChar() to add a word value and sbAddAnsi() to add ansi text to our string builder as wide characters. Should be pretty quick for appending unicode text. There are a lot of supplemental functions to do various things with the string builder as well so check out that part of the code.

Unicode text functions. There are several functions to add strings together (ansi or unicode). There are uncode equivalents of the PB functions - StringField, CountString, FindString, Trim, Mid, Right, Left, RSet, LSet, Len and Str. There are also some custom functions like ucCopyString() to completely copy one unicode string into another. Two functions to test for a null or empty string, a function to remove delimited string duplicates and a few others. I tried to make them somewhat quick but with only varying results. I'm no ASM programmer >_< and no doubt made stupid mistakes that could speed some things up. Also, I originally wrote this code for me so some things are more specific to what I need. An example - ucTrim(inString.l, EraseOriginal.b). The 'EraseOriginal' flag will call FreeMemory() on inString if True. That way I could call ucTrim() with the same variable as inString and not worry about leaking memory.

There are a couple of unicode directory functions - a DirectoryExists() and ExePath() function.

There are quite a few file functions. I tried to duplicate the native PB functions so you'll see stuff like ReadFileW() and EofW(), ReadStringW(), etc... I'm kinda proud of my ReadStringW() function. See, when using ReadFileW() it will try to detect the file encoding (ansi, utf-8, utf-16, etc...) and then that type can be passed to ReadStringW() (or you can use one of the constants if you're sure - #UCFileEncodingUTF16LE for example) and then the procedure will do it's best to read the text properly. It should convert an ansi string to wide character if reading from ansi text files. I don't have a lot of different encoded files so I can't test this as much as I'd like. I especially wasn't able to test the UTF-8 type since I don't have proper UTF-8 encoded text files nor do I know their format. Oh, there are also some BOM (byte order marking) functions like SkipBOM() to skip over the BOM bytes at the beginning of a text file or WriteBOM() to write a specific BOM to a new text file.

Lastly, there are some INI file functions. I have functions to deal with Keys, Sections, Delimited items and values. They *should* work well but they were the first ones I worked on and I learned as I went so they may not be as stable as some of the other functions >_>

IMPORTANT

To use these functions you'll need the W functions from MS so I've included custom *.lib files that should be placed in their appropriate directories (PureLibraries\Windows and PureLibraries\Windows\Libraries). Be sure to back up the existing libraries before you copy these over.

Here's an example of one of the string functions so you can get an idea of how they look...

Code: Select all

Procedure.l ucMid(inString.l, StartPosition.l, Length.l, EraseOriginal.b) ; Pass Length = 0 for the rest of the string.
   ; This is the wide character equivalent of the Mid() procedure.
   Protected ReturnString.l
   ; This will be the return string.
   Protected HoldLength.l
   ; This is the length of the passed string.
   HoldLength = lstrlenW_(inString)
   ; Store the length of the passed string (in characters).
   If StartPosition > HoldLength : ProcedureReturn ucReturnEmpty() : EndIf
   ; If our start position is more than the length of the file, exit.
   If StartPosition < 1 : StartPosition = 1 : EndIf
   ; Fix our start position if it's less than 1.
   If Length < 0 : ProcedureReturn ucReturnEmpty() : EndIf
   ; Return an empty string on an invalid length.
   If Length + StartPosition > HoldLength Or Length = 0 : Length = HoldLength - StartPosition + 1 : EndIf
   ; Make sure our length fits within the length of the passed string.  And if the user passed 
   ; zero length, copy to end of string.
   ReturnString = AllocateMemory((Length * 2) + 2)
   ; Allocate space for our new string.
   CopyMemory(inString + ((StartPosition - 1) * 2), ReturnString, Length * 2)
   ; Now copy our string section.
   If EraseOriginal = #True : FreeMemory(inString) : EndIf
   ;
   ProcedureReturn ReturnString
   ; Return the new string.
EndProcedure
And that's about it. Oh, the title of the test unicode app may not show up as Japanese characters if you don't have the proper language support set. I have it set on my home computer and all of it shows fine but at work I don't have Asian character support set so the title of the windows shows as blocks. The buttons and such are fine but the window title won't change. Just so you know.

I hope that's everything!

Here is where to download the file - http://xombie.soldats.net/Storage/UnicodePB.rar

I hope it runs okay for everyone and is useful ^_^ Be sure to look at the procedures before you use them so you don't make a mistake. And as always - good luck and have fun :D
Justin
Addict
Addict
Posts: 948
Joined: Sat Apr 26, 2003 2:49 pm

Post by Justin »

Good example. How do you create the libs, it is possible to have unicode libs without touchung the current ansi windows libs?
Blade
Enthusiast
Enthusiast
Posts: 362
Joined: Wed Aug 06, 2003 2:49 pm
Location: Venice - Italy, Japan when possible.
Contact:

Post by Blade »

Xombie you did a great job, I would replace the existing libs but I'm worried about what will happen with the next PB update?
Anyway this is a great hint for Fred if he doesn't know where to start with unicode :)
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

I thought nobody was interested in these ^_^

If I recall, the libraries I included don't replace the existing ones. They're extra. For example, the existing ones are USER32.DLL and I include a USER32W.DLL for the wide functions. Simply back up the purelibraries directory first if you don't believe me :) And then you can examine the rar file to see if there are any files that already exist. It shouldn't be.

Let me know if y'all need any more help.

Incidentally, I've been doing a *lot* more work on the Unicode functions so these are pretty old. Didn't think anybody was interested so I never updated the functions. Working with memory and unicode like this is a pain in the rear end >_<
freak
PureBasic Team
PureBasic Team
Posts: 5944
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Post by freak »

Blade wrote:Xombie you did a great job, I would replace the existing libs but I'm worried about what will happen with the next PB update?
There is no problem there. A new PB update would simply overwrite them again.
You can also always use SmartUpdate to revert such changes. It will tell
you which files are changed and re-download them. quite usefull.
quidquid Latine dictum sit altum videtur
Justin
Addict
Addict
Posts: 948
Joined: Sat Apr 26, 2003 2:49 pm

Post by Justin »

Xombie, can you explain how to create the unicode libs?, using the PB dll importer you have to supply a def file with the dll name, wouldn't this overwrite the existing ones?

about lang internationalization i've seen you're using ini files, this is the easiest way but you can also use dlls with string tables as a resource, one dll for each lang, then use LoadStringW_() this is the standard way. the pros is that strings are referenced by an integer constant in your app consuming less resources but are more difficult to update, recompile the dll.. with PB you'll need res hacker. i haven't tried it though
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

I'll have to look at the LoadString/W() functions. I'm not very familiar with them. I figure there's many ways to store the strings and such so once the base unicode functions are in place, the language strings can be stored and retrieved however you like.

To answer your question about creating the unicode libraries - I went through a few steps. First, I made a file called User32W.pbl. Second, I renamed the original user32.lib in the purelibraries directory to old_user32.lib. I then used the dllimporter.exe to import the user32w.pbl file. It imports it as user32.lib. I rename that new user32.lib to user32w.lib and then rename old_user32.lib to user32.lib. A stupid workaround but it works for me. The user32w.pbl looks something like...
USER32.DLL
SendMessageW 4
GetMessageW 4
DispatchMessageW 1
CallWindowProcW 5
GetClassInfoW 3
CreateWindowExW 12
CharUpperW 1
CharUpperBuffW 2
CharLowerW 1
CharLowerBuffW 2
CharNextW 1
CharPrevW 2
IsCharAlphaW 1
IsCharAlphaNumericW 1
IsCharUpperW 1
IsCharLowerW 1
LoadMenuW 2
LoadMenuIndirectW 1
GetMenuStringW 5
InsertMenuW 5
AppendMenuW 4
ModifyMenuW 5
InsertMenuItemW 4
GetMenuItemInfoW 4
SetMenuItemInfoW 4
DrawTextW 5
DrawTextExW 6
SetWindowTextW 2
GetWindowTextW 3
GetWindowTextLengthW 1
MessageBoxW 4
MessageBoxExW 5
MessageBoxIndirectW 1
GetWindowLongW 2
SetWindowLongW 3
GetClassLongW 2
SetClassLongW 3
GetClassNameW 3
LoadCursorW 2
DefWindowProcW 4
RegisterClassExW 1
Just a simple text file named user32w.pbl

Hope that helps you a little.
Justin
Addict
Addict
Posts: 948
Joined: Sat Apr 26, 2003 2:49 pm

Post by Justin »

i'll try it, thanks

if someone wants to try a stringtable, create a pb dll with a dummy proc
ProcedureDLL AttachProcess(hinst.l)
EndProcedure

an example of string table

Code: Select all

#define IDS_TEST1    100
#define IDS_TEST2  101

STRINGTABLE
BEGIN
IDS_TEST1 	"test1"
IDS_TEST2 	"test2"
END
save this with .rc extension, test.rc. then with the rc.exe command line tool of the win SDK use,
rc.exe /r test.rc

will create a test.res file. use res hacker to add it to the pb dll. and LoadString_() with the string id from your code
[/code]
Justin
Addict
Addict
Posts: 948
Joined: Sat Apr 26, 2003 2:49 pm

Post by Justin »

forgot to say that you can set the language,

Code: Select all

#define IDS_TEST1    100
#define IDS_TEST2  101

LANGUAGE 0x0411, 0x01

STRINGTABLE
BEGIN
IDS_TEST1 	"テスト1"
IDS_TEST2 	"テスト2"
END
the 2 lang numbers are the lang and sub lang id, they are in the sdk, LANG_JAPANESE, SUBLANG_DEFAULT. but i don't know in wich include you could do your own.
zikitrake
Addict
Addict
Posts: 876
Joined: Thu Mar 25, 2004 2:15 pm
Location: Spain

Post by zikitrake »

:? The original link ( http://xombie.soldats.net/Storage/UnicodePB.rar ) don't work.

any member have this file?

Thank you!
PB 6.21 beta, PureVision User
Justin
Addict
Addict
Posts: 948
Joined: Sat Apr 26, 2003 2:49 pm

Post by Justin »

i have the file, if the link isn't fixed i can send by mail

xombie, the kernel.dll seems to be special. normally i just create the unicode libs like you said and it works, but with kernel doesn't (lstrlenW_)., and i noticed you supplied both the ansi and unicode versions to work. can you explain?, thanks
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

Yes. I'll post a link to the file and look at the kernel stuff this evening. I'm off to kill myself at the boxing class. Haven't been since April. I'll die >_<
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

zikitrake - i sent you an email with an updated version. I used the email address from your website.

Justin - I'm not sure why I included the non-w *.lib files. You shouldn't need them. Perhaps I included them as a kind of backup? lstrlenw_() should certainly be within kernel32w.lib / KERNEL32. I've extracted only the W functions from my archive and they work, including lstrlenw_(). So I'm not sure of the problem you're having.

Do you have an email address you want me to send the updated functions to? Lots of bug fixes since the older version. Memory string handling is a pain and can crash easily if the allocated memory is not handled correctly.
zikitrake
Addict
Addict
Posts: 876
Joined: Thu Mar 25, 2004 2:15 pm
Location: Spain

Post by zikitrake »

Xombie :D Thank you!
PB 6.21 beta, PureVision User
Post Reply