Page 1 of 2
Save in ASCII
Posted: Mon Nov 03, 2008 8:47 pm
by silvercover
Hi,
I need to save returned results (in UTF8) from SQLite database in plain text file with ASCII or ANSI code page. I tried to do it with this :
Code: Select all
WriteString(0, ResultSet$, #PB_Ascii)
But it doesn't save with ANSI code page and save data with UTF-8 format.
what should I do?
Thanks in advance.
Posted: Mon Nov 03, 2008 8:55 pm
by Trond
If your program is compiled in unicode mode and that doesn't work, there's a bug somewhere. But most likely it works correctly and something else is astray.
Re: Save in ASCII
Posted: Mon Nov 03, 2008 8:57 pm
by Demivec
silvercover wrote:Hi,
I need to save returned results (in UTF8) from SQLite database in plain text file with ASCII or ANSI code page. I tried to do it with this :
Code: Select all
WriteString(0, ResultSet$, #PB_Ascii)
But it doesn't save with ANSI code page and save data with UTF-8 format.
what should I do?
Thanks in advance.
Wouldn't you use this?
Code: Select all
WriteString(0, ResultSet$, #PB_UTF8)
Posted: Mon Nov 03, 2008 9:07 pm
by silvercover
Wouldn't you use this?
I need to do this because I made a plug-in for a Non-Unicode application. so when I want to pass returned result from plug-in to that app i get some unknown characters. therefor I decided to save returned data in a plain text file with ANSI code page and let that app read results.
I can do this manually with notepad and everything work OK.
If your program is compiled in Unicode mode and that doesn't work, there's a bug somewhere. But most likely it works correctly and something else is astray.
I tested it both in unicode mode and plain text mode.
Thank you guys.
Posted: Mon Nov 03, 2008 9:28 pm
by Trond
As I told you, it works in Unicode mode. You have some other problem in your program.
Code: Select all
ResultSet.s = "ÆØÅ"
OpenFile(0, "c:\out.txt")
WriteString(0, ResultSet, #PB_Ascii)
Posted: Mon Nov 03, 2008 9:49 pm
by silvercover
We know SQLite default encoding is UTF8 and so the result. then I use below code to save returned result:
Code: Select all
ProcedureCDLL.l SaveResult()
If OpenFile(0, Filename$)
WriteString(0, ResultSet$, #PB_Ascii)
CloseFile(0)
Saved.l = 1
Else
Saved.l = 0
EndIf
SetData(Saved)
EndProcedure
Filename$ , Saved.l and ResultSet$ are global..
I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
Posted: Mon Nov 03, 2008 10:42 pm
by srod
My knowledge of this stuff is a bit shaky, but the way I understand it leads me to believe that your attempt to apply a 'code page' to a text file which is inherently in Ascii encoding isn't really a great idea.
A text file using Ascii encoding is really using a 7-bit encoding whereas the Ansi code pages are of course 8-bit. Utf-8 is a unicode encoding which only preserves Ascii characters in the range 1 to 127 - beyond this and there is no real link between the encoding and the various code pages you might wish to employ because, as I say, it is a unicode encoding and not an 'ansi code page' encoding.
You will either need to work in utf-8 (or unicode) directly as has already been suggested, or write some code which does the translation yourself. All you need do is get the utf-8 text from the SQLite database and store this in a unicode string variable (set the unicode compiler switch). Then use WideCharToMultiByte_() to translate the unicode string to the code page of your choice before writing to the text file. Of course you will need to switch your code page in whatever text viewer you are then using to view the file etc.
Posted: Tue Nov 04, 2008 12:44 am
by pdwyer
Agree with Srod,
These #PB_Ascii methods have never worked for me changing too and from codepages to UTF anything.
It always takes me ages to get those WideCharToMultiByte functions working too, you have to call them twice or something, once to init the buffer and then to convert.
Search on them in the forums for examples. I think that they are the way to go on this.
Posted: Tue Nov 04, 2008 8:41 pm
by Trond
silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Posted: Wed Nov 05, 2008 12:25 am
by Demivec
Trond wrote:silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Arabic characters are on the ANSI code page #1256.
Posted: Wed Nov 05, 2008 11:14 am
by Trond
Demivec wrote:Trond wrote:silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Arabic characters are on the ANSI code page #1256.
No, codepage number 1256 is the arab code page. ANSI is codepage 1252 and doesn't have arab characters.
Codepage 1250 (EE)
Codepage 1251 (Cyrl)
Codepage 1252 (ANSI)
Codepage 1253 (Greek)
Codepage 1254 (Turk)
Codepage 1255 (Hebr)
Codepage 1256 (Arab)
Codepage 1257 (BaltRim)
Codepage 1258 (Viet)
So if you use code page 1256 you can use arab characters (as you said), but then it is not the ANSI codepage.
Posted: Wed Nov 05, 2008 3:55 pm
by Demivec
Trond wrote:Demivec wrote:Trond wrote:silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Arabic characters are on the ANSI code page #1256.
No, codepage number 1256 is the arab code page. ANSI is codepage 1252 and doesn't have arab characters.
Codepage 1250 (EE)
Codepage 1251 (Cyrl)
Codepage 1252 (ANSI)
Codepage 1253 (Greek)
Codepage 1254 (Turk)
Codepage 1255 (Hebr)
Codepage 1256 (Arab)
Codepage 1257 (BaltRim)
Codepage 1258 (Viet)
So if you use code page 1256 you can use arab characters (as you said), but then it is not the ANSI codepage.
I see we are on the same (code) page.
"ANSI" is sometime used to refer to any of the code pages. I was just mentioning that in case you had overlooked it. IMHO I don't care what they're called, it just seemed that silvercover was using the term one way and you were using it another way. I am not worried about which is considered correct, or arguing about it.
This kind of use of the term was popularized by Microsoft. Here's a quote from the MSDN to illustrate it.
Windows code pages, commonly called "ANSI code pages", are code pages for which non-ASCII values (values greater than 127) represent international characters. These code pages are used natively in Windows 95/98/Me, and are also available on Windows NT and later.
Note: Originally, Windows code page 1252, the code page commonly used for English and other Western European languages, was based on an American National Standards Institute (ANSI) draft. That draft eventually became ISO 8859-1, but Windows code page 1252 was implemented before the standard became final, and is not exactly the same as ISO 8859-1.
Many Windows API functions have "A" (ANSI) and "W" (wide) versions. The "A" version handles text based on Windows code pages, while the "W" version handles Unicode text. See Windows Data Types for Strings and Conventions for Function Prototypes.
For the record I agree with you Trond but silvercover wasn't wrong either, according to the above quote.
Posted: Wed Nov 05, 2008 4:25 pm
by silvercover
I put returned results (in Arabic or other languages) in one plain text file and then change its encoding manually with my Notepad2 editor or Windows Notepad to ANSI. So by this that application can read this file correctly.
What I'm looking for is a way to do this automatically. I know about ASCII, ANSI and Unicode but I used ANSI in my description based on what I said about the mechanism I need.
Thanks.
Posted: Wed Nov 05, 2008 8:04 pm
by srod
To be honest it sounds like it should now be the other applications job to interpret the text file correctly etc. If it is expecting an Ascii encoded file in which character codes 128-255 are to be interpreted as being from a particular code page then job done! Providing you constructed the text file with the aid of WideCharToMultiByte_() with the relevant code page then what more is there for you to do? If the other application wishes, for example, to render the text file onto a window then it will have to create a font using the appropriate character set / code page.
Posted: Thu Nov 06, 2008 12:44 am
by pdwyer
I might be able to test this for your tonight, (although I'm dealing with cp932, the logic is the same). You need to check the content of your strings at each step. Check what comes out of your DB to make sure your UTF8 (source) is clean first, then do your conversion, then check the save.
Seeing as windows treats utf8 as a codepage too, you may need to convert to utf16 first and then back to your CP as utf8 is not THE unicode used under the hood in MS OS.
If you can pull the data out of SQLite as UTF16 (and these functions are probably called when you compile the PB SQLite lib in unicode mode) then you should be able to do a one step conversion to your CP with Srod's suggested API.
If I get a chance I'll try to have a play with this tonight. I'm sure I'll need the reference code in future.