Save in ASCII

Just starting out? Need help? Post your questions and find answers here.
User avatar
silvercover
User
User
Posts: 86
Joined: Sat Aug 04, 2007 6:57 pm

Save in ASCII

Post by silvercover »

Hi,

I need to save returned results (in UTF8) from SQLite database in plain text file with ASCII or ANSI code page. I tried to do it with this :

Code: Select all

WriteString(0, ResultSet$, #PB_Ascii)
But it doesn't save with ANSI code page and save data with UTF-8 format.
what should I do?

Thanks in advance.
Last edited by silvercover on Tue Nov 25, 2008 10:49 am, edited 2 times in total.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

If your program is compiled in unicode mode and that doesn't work, there's a bug somewhere. But most likely it works correctly and something else is astray.
User avatar
Demivec
Addict
Addict
Posts: 4265
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Save in ASCII

Post by Demivec »

silvercover wrote:Hi,

I need to save returned results (in UTF8) from SQLite database in plain text file with ASCII or ANSI code page. I tried to do it with this :

Code: Select all

WriteString(0, ResultSet$, #PB_Ascii)
But it doesn't save with ANSI code page and save data with UTF-8 format.
what should I do?

Thanks in advance.
Wouldn't you use this?

Code: Select all

WriteString(0, ResultSet$, #PB_UTF8)
User avatar
silvercover
User
User
Posts: 86
Joined: Sat Aug 04, 2007 6:57 pm

Post by silvercover »

Wouldn't you use this?
I need to do this because I made a plug-in for a Non-Unicode application. so when I want to pass returned result from plug-in to that app i get some unknown characters. therefor I decided to save returned data in a plain text file with ANSI code page and let that app read results.

I can do this manually with notepad and everything work OK.
If your program is compiled in Unicode mode and that doesn't work, there's a bug somewhere. But most likely it works correctly and something else is astray.
I tested it both in unicode mode and plain text mode.

:!: :?:

Thank you guys.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

As I told you, it works in Unicode mode. You have some other problem in your program.

Code: Select all

ResultSet.s = "ÆØÅ"

OpenFile(0, "c:\out.txt")
WriteString(0, ResultSet, #PB_Ascii)
User avatar
silvercover
User
User
Posts: 86
Joined: Sat Aug 04, 2007 6:57 pm

Post by silvercover »

We know SQLite default encoding is UTF8 and so the result. then I use below code to save returned result:

Code: Select all

ProcedureCDLL.l SaveResult()
  If OpenFile(0, Filename$)  
     WriteString(0, ResultSet$, #PB_Ascii)
     CloseFile(0)
     Saved.l = 1
  Else
     Saved.l = 0   
  EndIf
  
  SetData(Saved)
EndProcedure
Filename$ , Saved.l and ResultSet$ are global..

I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

My knowledge of this stuff is a bit shaky, but the way I understand it leads me to believe that your attempt to apply a 'code page' to a text file which is inherently in Ascii encoding isn't really a great idea.

A text file using Ascii encoding is really using a 7-bit encoding whereas the Ansi code pages are of course 8-bit. Utf-8 is a unicode encoding which only preserves Ascii characters in the range 1 to 127 - beyond this and there is no real link between the encoding and the various code pages you might wish to employ because, as I say, it is a unicode encoding and not an 'ansi code page' encoding.

You will either need to work in utf-8 (or unicode) directly as has already been suggested, or write some code which does the translation yourself. All you need do is get the utf-8 text from the SQLite database and store this in a unicode string variable (set the unicode compiler switch). Then use WideCharToMultiByte_() to translate the unicode string to the code page of your choice before writing to the text file. Of course you will need to switch your code page in whatever text viewer you are then using to view the file etc.
I may look like a mule, but I'm not a complete ass.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Agree with Srod,

These #PB_Ascii methods have never worked for me changing too and from codepages to UTF anything.

It always takes me ages to get those WideCharToMultiByte functions working too, you have to call them twice or something, once to init the buffer and then to convert.

Search on them in the forums for examples. I think that they are the way to go on this.
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
User avatar
Demivec
Addict
Addict
Posts: 4265
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

Trond wrote:
silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Arabic characters are on the ANSI code page #1256.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Demivec wrote:
Trond wrote:
silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Arabic characters are on the ANSI code page #1256.
No, codepage number 1256 is the arab code page. ANSI is codepage 1252 and doesn't have arab characters.

Codepage 1250 (EE)
Codepage 1251 (Cyrl)
Codepage 1252 (ANSI)
Codepage 1253 (Greek)
Codepage 1254 (Turk)
Codepage 1255 (Hebr)
Codepage 1256 (Arab)
Codepage 1257 (BaltRim)
Codepage 1258 (Viet)

So if you use code page 1256 you can use arab characters (as you said), but then it is not the ANSI codepage.
User avatar
Demivec
Addict
Addict
Posts: 4265
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

Trond wrote:
Demivec wrote:
Trond wrote:
silvercover wrote:I have no problem when returned results are in Latin characters but when results are in non-Latin chars like Arabic the problem shows up.
But it doesn't save with ANSI code page
The ANSI character set doesn't have arabic characters.
Arabic characters are on the ANSI code page #1256.
No, codepage number 1256 is the arab code page. ANSI is codepage 1252 and doesn't have arab characters.

Codepage 1250 (EE)
Codepage 1251 (Cyrl)
Codepage 1252 (ANSI)
Codepage 1253 (Greek)
Codepage 1254 (Turk)
Codepage 1255 (Hebr)
Codepage 1256 (Arab)
Codepage 1257 (BaltRim)
Codepage 1258 (Viet)

So if you use code page 1256 you can use arab characters (as you said), but then it is not the ANSI codepage.
I see we are on the same (code) page. :wink:

"ANSI" is sometime used to refer to any of the code pages. I was just mentioning that in case you had overlooked it. IMHO I don't care what they're called, it just seemed that silvercover was using the term one way and you were using it another way. I am not worried about which is considered correct, or arguing about it.

This kind of use of the term was popularized by Microsoft. Here's a quote from the MSDN to illustrate it.
Windows code pages, commonly called "ANSI code pages", are code pages for which non-ASCII values (values greater than 127) represent international characters. These code pages are used natively in Windows 95/98/Me, and are also available on Windows NT and later.

Note: Originally, Windows code page 1252, the code page commonly used for English and other Western European languages, was based on an American National Standards Institute (ANSI) draft. That draft eventually became ISO 8859-1, but Windows code page 1252 was implemented before the standard became final, and is not exactly the same as ISO 8859-1.

Many Windows API functions have "A" (ANSI) and "W" (wide) versions. The "A" version handles text based on Windows code pages, while the "W" version handles Unicode text. See Windows Data Types for Strings and Conventions for Function Prototypes.
For the record I agree with you Trond but silvercover wasn't wrong either, according to the above quote.
User avatar
silvercover
User
User
Posts: 86
Joined: Sat Aug 04, 2007 6:57 pm

Post by silvercover »

I put returned results (in Arabic or other languages) in one plain text file and then change its encoding manually with my Notepad2 editor or Windows Notepad to ANSI. So by this that application can read this file correctly.

What I'm looking for is a way to do this automatically. I know about ASCII, ANSI and Unicode but I used ANSI in my description based on what I said about the mechanism I need.

Thanks.
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

To be honest it sounds like it should now be the other applications job to interpret the text file correctly etc. If it is expecting an Ascii encoded file in which character codes 128-255 are to be interpreted as being from a particular code page then job done! Providing you constructed the text file with the aid of WideCharToMultiByte_() with the relevant code page then what more is there for you to do? If the other application wishes, for example, to render the text file onto a window then it will have to create a font using the appropriate character set / code page.
I may look like a mule, but I'm not a complete ass.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

I might be able to test this for your tonight, (although I'm dealing with cp932, the logic is the same). You need to check the content of your strings at each step. Check what comes out of your DB to make sure your UTF8 (source) is clean first, then do your conversion, then check the save.

Seeing as windows treats utf8 as a codepage too, you may need to convert to utf16 first and then back to your CP as utf8 is not THE unicode used under the hood in MS OS.

If you can pull the data out of SQLite as UTF16 (and these functions are probably called when you compile the PB SQLite lib in unicode mode) then you should be able to do a one step conversion to your CP with Srod's suggested API.

If I get a chance I'll try to have a play with this tonight. I'm sure I'll need the reference code in future.
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Post Reply