Removing 'ASCII' switch from PureBasic

Post by **Fred** » Tue Aug 05, 2014 11:31 am

Hi there,

Since PB 5.30, the minimum Windows OS to run PureBasic is Windows XP. That means than every OS (Windows XP+, OS X 10.5+, Linux) now supports unicode natively, so we discussed with Timo the opportunity for us to remove ASCII support from PB and provide an unicode only compiler. Still supporting ASCII is a big work for us, as we need to provide duplicate fonctions when dealing with strings, which can leads to more bugs. Also, ASCII is an old tech and is condamned to disappear sooner or later, as unicode can handle it as well.

What would change for you:

- Basically, if your software runs with the "unicode" swith ON, nothing will change, you can skip the following text

. If not, then you can enable it and test it.
- All strings in PB will be handled as UCS2 (16-bit) strings internally. So if you used "@String$" somewhere in your code, change are high it won't work anymore (if dealing which an API for example)
- We plan to provide 2 new functions, to ease things a bit:
*AsciiBuffer = ToAscii(String$)
*UTF8Buffer = ToUTF8(String$)

What would change for us:

- Faster building time, less code in our source tree, makefiles much shorter
- Less bugs because of code reduction
- No more unicode switch, so it's easier when sharing code source on the forum, or when developping an user lib (everybody is unicode)
- Makes PB definitely more modern.

We would like to do it for the 5.40 version. What are your thoughts about it ? Is it a deal breaker for you ?

edit: before freaking out, we are just talking about removing the "unicode switch", not all ascii related operations !

The Fantaisie Software Team

Ocean · Post by **Ocean** » Tue Aug 05, 2014 11:42 am

Hi Fred,

serial communications (gps receivers, embedded boards, sensors, etc.) still largely communicate with ASCII-based character sets, meaning that we need to push single byte values to those devices and receive single byte values from them... With no ASCII character set left in PureBasic, a programmer would need to do the encoding him-/herself...

cheers
Ocean

Post by **Fred** » Tue Aug 05, 2014 11:48 am

What about using ToAscii/ToUTF8() or prototype (p-ascii, p-utf8) when importing these functions ? We are talking about PB internals, not the fact than PB will be able to handle ASCII stuff (you could indeed still read ASCII files etc.). For example, the serial lib WriteSerialString() does support the #PB_Ascii flag already (http://www.purebasic.com/documentation/ ... tring.html).

Danilo · Post by **Danilo** » Tue Aug 05, 2014 11:52 am

Will functions like PokeS() and PeekS() still be supported? To read and write Ascii/UTF8/Unicode strings from/to memory buffers.
Same for ReadString() and WriteString() file functions, is it still supported to read/write ASCII and UTF files?
What about p-ascii and p-utf8 pseudo types for calling external library functions/APIs?

I use always UNICODE compiler mode, and by using the above functions, I am still able to interact
with functions that work with ASCII/UTF8 data.
For serial communication (mentioned by Ocean), we still have the Byte and Ascii types (.b .Byte .a .Ascii), we can use
with memory buffers.

If it is only about removing the ASCII compiler mode, and conversion functions (see above) are still
present, I see no problem so far.

Post by **Fred** » Tue Aug 05, 2014 11:54 am

Yes, all this will be indeed supported, we are just talking about the "Unicode switch" found in the "compiler options" window. May be I wasn't explicit enough in the fist post. I will edit it.

wilbert · Post by **wilbert** » Tue Aug 05, 2014 12:04 pm

If pseudo types would remain I think they idea itself is a good idea. The timing however looks bad to me.
It would have been much better if a decision like this would have been made prior to the 5.2x LTS release.
For years to come you still will have to post code to the forum supporting both ascii and unicode mode if you want it to be LTS compatible.

STARGÅTE · Post by **STARGÅTE** » Tue Aug 05, 2014 12:05 pm

This is a strong change, but for me it is ok.

If we need a "spezial" string format, we have already the functions like PokeS and PeekS() with StringByteLength() to write a string in Ascii/UTF8.
@Ocean: If we communicate with a device, we send some data (memory and not a string). So you can read the ascii with PeekS().

But: Many users need to be careful when they use memory functions like MD5, Base64 as string functions!
I often see code like this: MD5Fingerprint(@String) or Base64Decoder(@String).
I have write and post functions to work with this functions and strings in Ascii, UTF8 and unicode.

*AsciiBuffer = ToAscii(String$) is a nice feature, but we have to freeing the memory self?

Code: Select all

Procedure.i ToAscii(String.s)
  Protected *Buffer = AllocateMemory(StringByteLength(String, #PB_Ascii))
  PokeS(*Buffer, -1, #PB_Ascii)
  ProcedureReturn *Buffer

Post by **Fred** » Tue Aug 05, 2014 12:09 pm

STARGÅTE wrote:*AsciiBuffer = ToAscii(String$) is a nice feature, but we have to freeing the memory self?
Code: Select all
Procedure.i ToAscii(String.s)
  Protected *Buffer = AllocateMemory(StringByteLength(String, #PB_Ascii))
  PokeS(*Buffer, -1, #PB_Ascii)
  ProcedureReturn *Buffer

It's not decided for now. BTW, you need to add one byte for for AllocateMemory(), for the terminating null byte.

Post by **Fred** » Tue Aug 05, 2014 12:10 pm

wilbert wrote:If pseudo types would remain I think they idea itself is a good idea. The timing however looks bad to me.
It would have been much better if a decision like this would have been made prior to the 5.2x LTS release.
For years to come you still will have to post code to the forum supporting both ascii and unicode mode if you want it to be LTS compatible.

New LTS will be out in about 1 year, so by the time the 5.40 is out, it will be less than one year, which seems OK to me.

Ocean · Post by **Ocean** » Tue Aug 05, 2014 12:14 pm

if streaming ASCII is still going to be supported by the various libraries I think your proposal is a good thing.

cheers
Ocean

User_Russian · Post by **User_Russian** » Tue Aug 05, 2014 12:15 pm

Fred wrote:We would like to do it for the 5.40 version. What are your thoughts about it ? Is it a deal breaker for you ?

This is a very very bad idea!

Some projects may be only ASCII! For example, DLL, called from other programs, including not created in PureBasic. This greatly complicates the programming! Have to abandon from strings and use a memory!
In addition, the conversion from Unicode to ASCII, and from UTF-8 to ASCII, does not always work correctly and I some time ago published examples incorrect conversion!

Post by **Fred** » Tue Aug 05, 2014 12:21 pm

Could you be more specific about your DLL example ? About the bugs, could you point the topics ? We uses standard function (WideCharToMultiByte_()) to do it, so it should be OK.

Little John · Post by **Little John** » Tue Aug 05, 2014 12:26 pm

I've also got a question regarding DLLs:
When I create a DLL which is a plug-in for a 3rd party program, and that 3rd party program expects ASCII strings ... will that still be possible?

luis · Post by **luis** » Tue Aug 05, 2014 12:58 pm

If the only thing changing is the unicode switch permanently on (so to speak) and all the ascii flags supported by the various function will be kept intact (so it's still possible to both create and read ascii buffers), and the pseudotypes will also be kept as they are then it's acceptable for me.

Negatives:

When unicode is not needed ascii only strings may reduce the size of the final executable a lot and maybe for someone this may be important.

When you must communicate with something using ascii only, as long you still have Byte and Ascii data type all is well for single chars, but when you have to send strings you may have to do some operations on them in advance, so you will be forced to keep them in unicode if you want to be able to use the string library functions since they only understand unicode, and then you may have to do some kind of conversion on them (using pseudotypes when possible and manually when it isn't).

EDIT: see UTF8() and Ascii() added in 5.50 to do just so -> http://www.purebasic.fr/english/viewtop ... 14&t=65868

Having an ascii mode all would be more straightforward, as it is now.

In the end when the other half of the software layer you are communicating with expects ascii, you will sooner or later need to waste some cpu time in conversions, memory space in allocating temporary buffers, write some additional code to operate on those transitional buffers and so on.

So it will make it a little more complicate for PB users and a little more bug prone for some.

Having the option, I would keep the ascii mode obviously. Having ascii/unicode/x86/x64 was one of the many strong points of PB. If you remove ascii now and maybe then x86 later on... it will certainly lose some appeal for many.

Danilo · Post by **Danilo** » Tue Aug 05, 2014 1:00 pm

@Little John:

Code: Select all

ProcedureDLL.i DLLfunc(*input.Ascii)
    If *input
        theString.s = PeekS(*input,-1,#PB_Ascii)
    EndIf
    ProcedureReturn toAscii("returnString")
EndProcedure

It is all already possible. If you enable UNICODE compiler mode, you can still use ASCII functions/libs/DLLs.

The major problem will be for people that still work with ASCII compiler mode.
Either they convert their projects to work with Unicode strings or they stay with 5.3x for that project and
PB 5.4x for new projects only.

It requires the same effort for people that still use always 32bit PB and use .l data type everywhere.
When they move to 64bit, many things don't work anymore. The day will come when all OS are
64bit or higher, and PB 32bit is not required anymore...

PureBasic Forums - English

Removing 'ASCII' switch from PureBasic

Removing 'ASCII' switch from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing ASCII mode from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic