Removing 'ASCII' switch from PureBasic

Shield · Post by **Shield** » Tue Aug 05, 2014 1:05 pm

Pseudotypes most likely won't be changed.

I don't see any problems with dropping ASCII support for most of us. The vast majority of all applications
nowadays use or need Unicode. However, I also understand User_Russian's situation. He seems to work
quite a bit with low-level stuff. For users like him, dropping ASCII support would mean to either convert
from and to Unicode constantly (which introduces an overhead) or to do manual C-like string manipulation
in memory which can lead to the same bugs the string system tried to solve.

Personally, I think it's a good idea to get rid of ASCII and free up time for other things. May you guys could
think about providing a new library instead that provides the memory string functions for ASCII, e.g.

Code: Select all

*first = CreateAsciiString("foobar")
*second = CreateAsciiString("onetwo")

Debug AsciiLen(*first)

*third = ConcatenateAsciiString(*first, *second)

FreeAsciiString(*first)
FreeAsciiString(*second)

luis · Post by **luis** » Tue Aug 05, 2014 1:12 pm

One thing I would like if you decided to go this route is to let us know officially you have decided so as soon is possible.

This to avoid to write some code in the meantime to support ascii mode compilation when that mode will not exist anymore.
Because this is exactly what I'm doing, and if my lib will never be compilable in ascii code anymore I would like to avoid to write further useless code and modify/remove some of what I have already done.

User_Russian · Post by **User_Russian** » Tue Aug 05, 2014 1:13 pm

DLL invoked from another programming language, for example, it may be Delphi 7 or another.
This code, are compiled in ASCII works fine.

Code: Select all

ProcedureDLL.s Funct(String.s)
  Static Result.s
  
  Result = Mid(String, 2)
  
  ; Other commands
  
  ProcedureReturn Result
EndProcedure

But when you compile in Unicode, not working! Need to modify the code. If the with input parameters it is not difficult, the result will have to work with memory!

Code: Select all

ProcedureDLL Funct(*String)
  Protected String.s, Result.s, *x
  Static *Result
  
  String = PeekS(*String, -1, #PB_Ascii)
  Result = Mid(String, 2)
  
  ; Other commands
  
  If *Result=0
    *Result=AllocateMemory(StringByteLength(Result, #PB_Ascii)+2)
    If *Result=0
      ProcedureReturn 0
    EndIf
  Else
    *x=ReAllocateMemory(*Result, StringByteLength(Result, #PB_Ascii)+2)
    If *x
      *Result=*x
    Else
      ProcedureReturn 0
    EndIf
  EndIf
  
  PokeS(*Result, Result, -1, #PB_Ascii)
  
  ProcedureReturn *Result
EndProcedure

Fred wrote:About the bugs, could you point the topics ?

Here topic. http://www.purebasic.fr/english/viewtop ... 28&t=46807
The problem is still not solved.

Fred wrote:so we discussed with Timo the opportunity for us to remove ASCII support from PB and provide an unicode only compiler.

I hope that the next step will not be deleted, not thread-safe subsystem?

People are asked to add support for working with ASCII, UTF-8 and UNICODE strings, and you delete!

Also not solved the problem with a very slow working with strings. http://www.purebasic.fr/english/viewtop ... =3&t=58892
Using Unicode increases the time by 100%! Working with strings become even slower!

marroh · Post by **marroh** » Tue Aug 05, 2014 2:43 pm

I develop apps for lots of low budget pc in country's the not have the money for modern pc. Unicode needs a lot of more resources. The changes of no ascii support (no more build ascii only exe) will result i have to stop using pb for develop.

Fred wrote: - No more unicode switch, so it's easier when sharing code source on the forum, or when developping an user lib (everybody is unicode)
- Makes PB definitely more modern.

I think it's not true, a modern compiler always support both, unicode and ascii! And your other reasons like "... or when developping an user lib (everybody is unicode)" I feel as void because user libs in pb was no topic since x years. Not because the unicode / ascii thing, because the win / mac / linux thing. I think remove the ascii support makes PB definitely NOT modern.

Little John · Post by **Little John** » Tue Aug 05, 2014 2:58 pm

Danilo, thanks for the information!

juror · Post by **juror** » Tue Aug 05, 2014 3:14 pm

Personally, I'd prefer ascii support rather than SpiderBasic, but then, there may be less money in ascii support.

heartbone · Post by **heartbone** » Tue Aug 05, 2014 3:27 pm

If it ain't broke...

Didelphodon · Post by **Didelphodon** » Tue Aug 05, 2014 3:40 pm

I think such a change would lead to a lot of work for me. I often load text-files directly into memory using ReadData to afterwards use portions of it using functions like PeekS. This leads to garbage when switching to unicode - just tried it.

skywalk · Post by **skywalk** » Tue Aug 05, 2014 3:42 pm

For hardware reasons, I still develop mostly in Ascii. With enough code I can adapt to memory based transfers but that should only happen because of some speed advantage or a limitation brought about by native Ascii strings. I don't see that today.
Actually I prefer this request from a while ago: Enable simultaneous Ascii($) and Unicode($$) strings.

IdeasVacuum · Post by **IdeasVacuum** » Tue Aug 05, 2014 4:15 pm

I often load text-files directly into memory using ReadData to afterwards use portions of it using functions like PeekS. This leads to garbage when switching to unicode - just tried it.

Try this:

Code: Select all

sText.s = PeekS(*MemoryBuffer, -1, #PB_UTF8)

That will work for UTF8 files (and ASCII files where the file is strictly ASCII (0-127 chars)).
And This

Code: Select all

sText.s = PeekS(*MemoryBuffer, -1, #PB_ASCII)

will work for ASCII.

You can determine whether a file is one format or another using ReadStringFormat() (Byte Order Mark) or by pre-testing char numbers in files that do not have a BOM.

Didelphodon · Post by **Didelphodon** » Tue Aug 05, 2014 4:55 pm

IdeasVacuum wrote:
I often load text-files directly into memory using ReadData to afterwards use portions of it using functions like PeekS. This leads to garbage when switching to unicode - just tried it.
Try this:
Code: Select all
sText.s = PeekS(*MemoryBuffer, -1, #PB_UTF8)
That will work for UTF8 files (and ASCII files where the file is strictly ASCII (0-127 chars)).
And This
Code: Select all
sText.s = PeekS(*MemoryBuffer, -1, #PB_ASCII)
will work for ASCII.

You can determine whether a file is one format or another using ReadStringFormat() (Byte Order Mark) or by pre-testing char numbers in files that do not have a BOM.

Thx for your tips. I am aware of these switches.
The UTF8 switch has a massive performance impact compared to the Ascii one.
However, even though it might look like not needing much work, if you have a *lot* of those PeekS, PokeS, and stuff it definitely takes a huge amount of time to "upgrade" as I'd have to check the usage of any of them and test them afterwards to be sure that everything works as before.

Imho, this is something I guess a lot of us were keeping in mind and on their lo-prio schedule since the unicode-switch was announced. It's a bit like IPv4 and IPv6

Cheers,
Didel.

Tenaja · Post by **Tenaja** » Tue Aug 05, 2014 5:04 pm

I like the idea of the utf8 conversion routine.

However, almost everything I do is ascii. I would be stuck in an old LTS if you eliminate it.

ts-soft · Post by **ts-soft** » Tue Aug 05, 2014 5:38 pm

Since some years, i do all stuff in unicode, so changing all internal strings to unicode is not a problem,
if we can convert it, for other purpose.

DoubleDutch · Post by **DoubleDutch** » Tue Aug 05, 2014 6:51 pm

No problem here - as long as (like you said) there is a way to do conversion somehow.

djes · Post by **djes** » Tue Aug 05, 2014 7:53 pm

I've always thought that this double compilation mode was a source of problems. It's the computer history, and it's not easy to handle such transition. Anyway, I think that strings should be handled not on a compiler global view, but on a type view. I know that a BASIC should be easy, and I would say that, by default, unicode should be the default type for strings, like 64 or 32 bits is for .i integers. All specialised (where speed is important) string functions should be doubled, as I think all functions are doubled for int, and conversion would occur on functions needing strings to the internal string format,. But definitively, the string format of variables and datas should be the choice of the programmer, the more, the best.

PureBasic Forums - English

Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic