Save in ASCII

silvercover · Post by **silvercover** » Mon Nov 03, 2008 8:47 pm

Hi,

I need to save returned results (in UTF8) from SQLite database in plain text file with ASCII or ANSI code page. I tried to do it with this :

Code: Select all

WriteString(0, ResultSet$, #PB_Ascii)

But it doesn't save with ANSI code page and save data with UTF-8 format.
what should I do?

Thanks in advance.

Trond · Post by **Trond** » Mon Nov 03, 2008 8:55 pm

If your program is compiled in unicode mode and that doesn't work, there's a bug somewhere. But most likely it works correctly and something else is astray.

Demivec · Post by **Demivec** » Mon Nov 03, 2008 8:57 pm

silvercover wrote:Hi,

I need to save returned results (in UTF8) from SQLite database in plain text file with ASCII or ANSI code page. I tried to do it with this :
Code: Select all
WriteString(0, ResultSet$, #PB_Ascii)
But it doesn't save with ANSI code page and save data with UTF-8 format.
what should I do?

Thanks in advance.

Wouldn't you use this?

Code: Select all

WriteString(0, ResultSet$, #PB_UTF8)

silvercover · Post by **silvercover** » Mon Nov 03, 2008 9:07 pm

Wouldn't you use this?

I need to do this because I made a plug-in for a Non-Unicode application. so when I want to pass returned result from plug-in to that app i get some unknown characters. therefor I decided to save returned data in a plain text file with ANSI code page and let that app read results.

I can do this manually with notepad and everything work OK.

If your program is compiled in Unicode mode and that doesn't work, there's a bug somewhere. But most likely it works correctly and something else is astray.

I tested it both in unicode mode and plain text mode.

Thank you guys.

Trond · Post by **Trond** » Mon Nov 03, 2008 9:28 pm

As I told you, it works in Unicode mode. You have some other problem in your program.

Code: Select all

ResultSet.s = "ÆØÅ"

OpenFile(0, "c:\out.txt")
WriteString(0, ResultSet, #PB_Ascii)

silvercover · Post by **silvercover** » Mon Nov 03, 2008 9:49 pm

We know SQLite default encoding is UTF8 and so the result. then I use below code to save returned result:

Code: Select all

ProcedureCDLL.l SaveResult()
  If OpenFile(0, Filename$)  
     WriteString(0, ResultSet$, #PB_Ascii)
     CloseFile(0)
     Saved.l = 1
  Else
     Saved.l = 0   
  EndIf
  
  SetData(Saved)
EndProcedure

Filename$ , Saved.l and ResultSet$ are global..

I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.

srod · Post by **srod** » Mon Nov 03, 2008 10:42 pm

My knowledge of this stuff is a bit shaky, but the way I understand it leads me to believe that your attempt to apply a 'code page' to a text file which is inherently in Ascii encoding isn't really a great idea.

A text file using Ascii encoding is really using a 7-bit encoding whereas the Ansi code pages are of course 8-bit. Utf-8 is a unicode encoding which only preserves Ascii characters in the range 1 to 127 - beyond this and there is no real link between the encoding and the various code pages you might wish to employ because, as I say, it is a unicode encoding and not an 'ansi code page' encoding.

You will either need to work in utf-8 (or unicode) directly as has already been suggested, or write some code which does the translation yourself. All you need do is get the utf-8 text from the SQLite database and store this in a unicode string variable (set the unicode compiler switch). Then use WideCharToMultiByte_() to translate the unicode string to the code page of your choice before writing to the text file. Of course you will need to switch your code page in whatever text viewer you are then using to view the file etc.

pdwyer · Post by **pdwyer** » Tue Nov 04, 2008 12:44 am

Agree with Srod,

These #PB_Ascii methods have never worked for me changing too and from codepages to UTF anything.

It always takes me ages to get those WideCharToMultiByte functions working too, you have to call them twice or something, once to init the buffer and then to convert.

Search on them in the forums for examples. I think that they are the way to go on this.

Trond · Post by **Trond** » Tue Nov 04, 2008 8:41 pm

silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.

But it doesn't save with ANSI code page

The ANSI character set doesn't have arabic characters.

Demivec · Post by **Demivec** » Wed Nov 05, 2008 12:25 am

Trond wrote:
silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.

But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.

Arabic characters are on the ANSI code page #1256.

Trond · Post by **Trond** » Wed Nov 05, 2008 11:14 am

Demivec wrote:
Trond wrote:
silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.

But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Arabic characters are on the ANSI code page #1256.

No, codepage number 1256 is the arab code page. ANSI is codepage 1252 and doesn't have arab characters.

Codepage 1250 (EE)
Codepage 1251 (Cyrl)
Codepage 1252 (ANSI)
Codepage 1253 (Greek)
Codepage 1254 (Turk)
Codepage 1255 (Hebr)
Codepage 1256 (Arab)
Codepage 1257 (BaltRim)
Codepage 1258 (Viet)

So if you use code page 1256 you can use arab characters (as you said), but then it is not the ANSI codepage.

Demivec · Post by **Demivec** » Wed Nov 05, 2008 3:55 pm

Trond wrote:
Demivec wrote:
Trond wrote:
silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.

But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Arabic characters are on the ANSI code page #1256.
No, codepage number 1256 is the arab code page. ANSI is codepage 1252 and doesn't have arab characters.

Codepage 1250 (EE)
Codepage 1251 (Cyrl)
Codepage 1252 (ANSI)
Codepage 1253 (Greek)
Codepage 1254 (Turk)
Codepage 1255 (Hebr)
Codepage 1256 (Arab)
Codepage 1257 (BaltRim)
Codepage 1258 (Viet)

So if you use code page 1256 you can use arab characters (as you said), but then it is not the ANSI codepage.

I see we are on the same (code) page.

"ANSI" is sometime used to refer to any of the code pages. I was just mentioning that in case you had overlooked it. IMHO I don't care what they're called, it just seemed that silvercover was using the term one way and you were using it another way. I am not worried about which is considered correct, or arguing about it.

This kind of use of the term was popularized by Microsoft. Here's a quote from the MSDN to illustrate it.

Windows code pages, commonly called "ANSI code pages", are code pages for which non-ASCII values (values greater than 127) represent international characters. These code pages are used natively in Windows 95/98/Me, and are also available on Windows NT and later.

Note: Originally, Windows code page 1252, the code page commonly used for English and other Western European languages, was based on an American National Standards Institute (ANSI) draft. That draft eventually became ISO 8859-1, but Windows code page 1252 was implemented before the standard became final, and is not exactly the same as ISO 8859-1.

Many Windows API functions have "A" (ANSI) and "W" (wide) versions. The "A" version handles text based on Windows code pages, while the "W" version handles Unicode text. See Windows Data Types for Strings and Conventions for Function Prototypes.

For the record I agree with you Trond but silvercover wasn't wrong either, according to the above quote.

silvercover · Post by **silvercover** » Wed Nov 05, 2008 4:25 pm

I put returned results (in Arabic or other languages) in one plain text file and then change its encoding manually with my Notepad2 editor or Windows Notepad to ANSI. So by this that application can read this file correctly.

What I'm looking for is a way to do this automatically. I know about ASCII, ANSI and Unicode but I used ANSI in my description based on what I said about the mechanism I need.

Thanks.

srod · Post by **srod** » Wed Nov 05, 2008 8:04 pm

To be honest it sounds like it should now be the other applications job to interpret the text file correctly etc. If it is expecting an Ascii encoded file in which character codes 128-255 are to be interpreted as being from a particular code page then job done! Providing you constructed the text file with the aid of WideCharToMultiByte_() with the relevant code page then what more is there for you to do? If the other application wishes, for example, to render the text file onto a window then it will have to create a font using the appropriate character set / code page.

pdwyer · Post by **pdwyer** » Thu Nov 06, 2008 12:44 am

I might be able to test this for your tonight, (although I'm dealing with cp932, the logic is the same). You need to check the content of your strings at each step. Check what comes out of your DB to make sure your UTF8 (source) is clean first, then do your conversion, then check the save.

Seeing as windows treats utf8 as a codepage too, you may need to convert to utf16 first and then back to your CP as utf8 is not THE unicode used under the hood in MS OS.

If you can pull the data out of SQLite as UTF16 (and these functions are probably called when you compile the PB SQLite lib in unicode mode) then you should be able to do a one step conversion to your CP with Srod's suggested API.

If I get a chance I'll try to have a play with this tonight. I'm sure I'll need the reference code in future.

PureBasic Forums - English

Save in ASCII

Save in ASCII

Re: Save in ASCII