Page 1 of 2

Additional characters in databases PostgreSQL, SQLite

Posted: Mon Nov 18, 2013 10:29 am
by lenski
From PureBasic version 5.10 i it is not possible to use additional characters like "€" in databases PostrgeSQL and SQLite, if the executable is not compiled in unicode.
I think, DatabaseUpdate writes to the databases in unicode, even if the executable is compiled in non-unicode, because i can read this characters later with unicode-executable correctly.

In PureBasic version 5.00 and prior it was possible to write/read additional characters, even if the executable was not compiled in unicode.

Otherwise, additional character, that was written with PureBasic 5.00 non-unicode-executable, can be readed with PureBasic 5.10-5.21beta3 non-unicode-executable correctly.
After writing again with 5.10-5.21beta3, i can read only "?" with non-unicode-executable.

I tested it under Windows 7/8 32/64bit on 2 pc's.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Mon Nov 25, 2013 11:22 am
by AndrewM
I had a similar situation recently. One way to test if the program can handle unicode is to use the CHR() function to create a unicode character. You will then be able to see if you can store unicode in a database. In my case I was using ms-access which can't handle unicode. I had to read the unicode string in as x and then run the following code.

Code: Select all

L = Len(x)   
For i = 1 To L
    a = AscW(Mid(x, i, 1))
    y = y & "," & Str(a)
    If a > 0 Then o = o & ChrW(a)
Next
The AscW and ChrW functions just ignore characters that are outside the range of normal ASCII or ANSI. It is not hard to use ASC in place of ASCW, just add an expression that rejects values that are outside the ANSI code range.

I suppose what I am trying to say is that the unicode in my case was present, I just could not see it as the ms-access/windows controls hide their contents when they encounter a character that they could not display. This maybe what you have encountered.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Mon Nov 25, 2013 3:08 pm
by lenski
The problem is, that PB 5.10 and greater is not able to read carachters like € from databases, thas it itself
had written with unicode-off-executable. It reads only an ?.
With PB 5.00 or prior it was possible to write € and to read €.

Try this code with PB 5.00 unicode-off and with PB 5.10-5.21 unicode-off, then you will sea the problem.

Code: Select all

UseSQLiteDatabase()

OpenWindow(0, 0, 0, 300, 300, "", #PB_Window_SystemMenu)
StringGadget(1, 10, 10, 200, 20, "")
EditorGadget(2, 10, 40, 200, 100)

Define String.s
Define DatabaseFile.s = GetTemporaryDirectory()+"Database.txt"
CreateFile(0, DatabaseFile) : CloseFile(0)
OpenDatabase(0, DatabaseFile, "", "")

DatabaseUpdate(0, "CREATE TABLE test (name TEXT)")

DatabaseUpdate(0, "INSERT INTO test (name) VALUES ('€')")

If DatabaseQuery(0, "SELECT * FROM test")
   While NextDatabaseRow(0)
      String = GetDatabaseString(0, 0)
      Debug String
      SetGadgetText(1, String)
      SetGadgetText(2, String)
   Wend
   FinishDatabaseQuery(0)
EndIf

CloseDatabase(0)

Repeat : Until WaitWindowEvent() = #PB_Event_CloseWindow

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Mon Nov 25, 2013 6:27 pm
by rsts
I'm not sure exactly when it changed. There have been a few discussions about it and it's not likely to change back.

I circumvented it by writing the ascii text strings as type blob. Using this, they do not go through any pb conversion and you maintain the one character designations for chars over 127.

cheers

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Mon Nov 25, 2013 6:47 pm
by ts-soft
SQLite uses always UTF-8. The result should always the same if you use Compiler-Option ASCII or Unicode.
There is no problem in older Version, so this should a new Bug.

Greetings - Thomas

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Mon Nov 25, 2013 7:27 pm
by rsts
In older versions of pb, ansii characters between 128 and 256 were passed as one byte and when read back were shown as their ansii equivalent. Since 5.0 they are passed as the utf-8 equivalent, 2 bytes and thus do not display properly in ansii.

It is a pb related issue. You can have a character which displays correctly in a list - send it to a sql database and see that it's been "converted" to 2 byte utf-8. Other sqlite editors I know, of do not exhibit this behavior. The data will be stored in sqlite as a one-byte character representation between 128 and 256. Only pb chooses to make this utf-8 "conversion" for you.

There was a discussion with freak sometime back.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Mon Nov 25, 2013 7:48 pm
by skywalk
I am confused. If you compile as Ascii, and then work with Unicode or UTF8 strings, why is this a bug? Why not compile as Unicode?

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Mon Nov 25, 2013 8:49 pm
by ts-soft
skywalk wrote:I am confused. If you compile as Ascii, and then work with Unicode or UTF8 strings, why is this a bug? Why not compile as Unicode?
If i store "100€" in a database, i will become back "100€" from database, all the same whether the program which reads my text from the database was compiled in Unicode or ASCII.

Store Text in UTF-8 is the only right way for this and all SQLite-Manager, i have ever used, read strings as UTF-8 from SQLiteDB.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Tue Nov 26, 2013 9:47 am
by Danilo

Code: Select all

UseSQLiteDatabase()
OpenDatabase(0, ":memory:", "", "", #PB_Database_SQLite)
DatabaseUpdate(0, "CREATE TABLE Test (Name VARChAR)")

string.s = "100€"
Debug "Char 4: "+PeekC(@string+3*SizeOf(Character))


DatabaseUpdate(0, "INSERT INTO Test (NAME ) VALUES ('"+string+"')" )
  
query$="SELECT * FROM Test "
If DatabaseQuery(0, query$)  
  While NextDatabaseRow(0)
      x$ = GetDatabaseString(0, 0)
      Debug "Len = "+Len(x$)
      Debug x$
  Wend      
  FinishDatabaseQuery(0)
EndIf

CloseDatabase(0)

Debug "Char 4: "+PeekC(@x$+3*SizeOf(Character))
MessageRequester("INFO",x$)
Output without Unicode:

Code: Select all

Char 4: 63
Len = 4
100?
Char 4: 63
MessageRequester without Unicode shows: 100?

With Unicode:

Code: Select all

Char 4: 8364
Len = 4
100
Char 4: 8364
MessageRequester with Unicode shows: 100€


The problem might be:
In the PB source I write '€', but PeekC() gives result 63 in Ascii mode, which is '?'. PB compiler converts the '€' sign to '?' within the string - even before adding it into the database.

WITH Unicode, debug output does not show the '€' sign here. I think debugger does not use
same unicode font as the system does. MessageRequester in Unicode mode shows the '€' sign,
so even when compiling in unicode mode, the display font is important. Not all fonts provide
all unicode characters.
Maybe the problem is that PB IDE, and included debugger(?), are compiled without Unicode?

I tested it on Mac OS X, but I think it is the same behavior lenski is talking about on Windows.

Best thing is to compile Unicode only nowadays, it should prevent such issues with character encoding conversions.
AFAIK the Euro sign '€' is not part of all ASCII character encodings, and there are some hundred or so on Windows.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Tue Nov 26, 2013 10:38 am
by lenski
As I have written in my first posting, you can read characters like € with unicode executable correctly, even if they
were written with unicode-off executable. PB unicode-off gives the wright character to database (not only "?"), but in unicode, because you read "€" with unicode-on, not "?". The "?" you read only if the "€" was written with PB 5.00 or prior inicode-off executable.
With PB 5.00 or prior all characters, wich were written unicode-off, are readable unicode-off, unicode-on not.
With PB 5.10-5.21 most characters > Chr(128), wich were written unicode-off, are not readable unicode-off, but they are readable unicode-on.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Tue Nov 26, 2013 1:22 pm
by lenski
I think, the best solution for the problem is to change DatabaseUpdate, so that it works under unicode-off like
GetDatabaseString works - in unicode-off. This way all peaple, who are using PB 5.00 or prior unicode-off-executable,
can be updated to PB 5.21 unicode-off-executable without loosing some data.

The second best way is to Change GetDatabaseString, so that it works like DatabaseUpdate works - in unicode.
All peaple, who are using PB 5.00 or prior unicode-off-executable, will loos most af characters greater then Chr(128)
updating to PB 5.21. But they would be able to read new written data correctly.

The solution today is good for nothing: yeas, you can read data, that was written with PB 5.00 and older correctly,
but the first writing with PB 5.21 gives most characters > Chr(128) to databases in one way, that GetDatabaseString
can't read. It reads "?" and, without changing this manually, the next writing would give "?" to DB. So the original character is lost vorever.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Tue Nov 26, 2013 5:12 pm
by Fred
GetDatabaseString() will be changed to handle utf8 in ascii as well.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Fri May 29, 2015 4:38 pm
by juror
Fred wrote:GetDatabaseString() will be changed to handle utf8 in ascii as well.
Apparently not done yet? Or possibly done using another PB internal conversion of ascii chars < 127? Because what's in the DB for ascii chars >127 is not what you get back with GetDataseString

Whoops - apologies - not marked DONE :oops:

P.S. the "blob" circumvention mentioned by rtst above appears to work.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Mon Aug 03, 2015 2:49 pm
by Fred
GetDatabaseString() correctly support UTF8, the euro sign isn't part of ASCII table as Danilo said, that's why it fails in ASCII mode. If you want to be on the safe side regarding non-ascii literal characters, you need to compile in unicode mode.

Re: Additional characters in databases PostgreSQL and SQLite

Posted: Thu Aug 13, 2015 1:26 pm
by lenski
Fred wrote:GetDatabaseString() correctly support UTF8, the euro sign isn't part of ASCII table as Danilo said, that's why it fails in ASCII mode. If you want to be on the safe side regarding non-ascii literal characters, you need to compile in unicode mode.
But why do you convert the characters chr(128)-chr(160) to unicode in DatabaseUpdate giving them to the database even if the executable is compiled in ASCII - or why you don't convert
them back to ASCII in GetDatabaseString reading them from the database?
This way it is not possible to use this characters within an ASCII-executable.
With PureBasic 5.0 or prior there were no Problems using them.

Greetings - Leo