Removing 'ASCII' switch from PureBasic

Developed or developing a new product in PureBasic? Tell the world about it.
User avatar
flaith
Enthusiast
Enthusiast
Posts: 704
Joined: Mon Apr 25, 2005 9:28 pm
Location: $300:20 58 FC 60 - Rennes
Contact:

Re: Removing 'ASCII' switch from PureBasic

Post by flaith »

This is the way I had to change an old program with UNICODE enabled:
The program read char by char to create a token:

In UNICODE, cannot work so I added this:

Code: Select all

Global.i FORMAT_BYTE = StringByteLength("a", #PB_Ascii)
CompilerIf #PB_Compiler_Unicode
  FORMAT_BYTE = StringByteLength("a", #PB_Unicode)
CompilerEndIf
The GetToken() function:

Code: Select all

;-Tokenizer
Procedure.s GetCurrentChar()
  Protected car.s
  
  ;car = Chr(PeekC(@LINE+CurrentPos)) ;ORIGINAL
  car = Chr(PeekA(@LINE+CurrentPos)) ;VERSION 1
  ;car = PeekS(@LINE+CurrentPos,1,#PB_Ascii) ;VERSION 2
  ProcedureReturn car
EndProcedure

Procedure.i SkipSpace()
  Protected nbspace.i = 0
  
  While GetCurrentChar() = " " Or GetCurrentChar() = #TAB$
    CurrentPos + FORMAT_BYTE
    nbspace + 1
  Wend
  
  ProcedureReturn nbspace  
EndProcedure

Procedure.s GetToken()
  Protected sTok.s = "", c.s
  Repeat
    c = GetCurrentChar()
    ; Car = '"' and Not inside a string ?     //String definition
    If c = #DBL_QUOTE And _QUOTE = #False
      sTok+c
      CurrentPos + FORMAT_BYTE
      _QUOTE = #True
      c = GetCurrentChar()
    EndIf
    ; Car = '"' and inside a string ?         //String definition
    If c = #DBL_QUOTE And _QUOTE = #True
      _QUOTE = #False
    EndIf
    ; Car = TAB and Not inside a string ?     //Tabulation
    If c = #CHAR_TAB And _QUOTE = #False
      CurrentPos + FORMAT_BYTE
      Break
    EndIf
    ; Car = ';' or Car = '*' in the beginning 
    ; of the line and Not inside a string ?   //Remark
    If c = ";" And _QUOTE = #False
      CurrentPos = LenLine
      Break
    EndIf
    If c = "*" And _QUOTE = #False And CurrentPos = 0
      CurrentPos = LenLine : sTok = ""
    EndIf
    ; Current car position >= current Line length ?
    If CurrentPos >= LenLine
      Break
    EndIf
    ; if it's a space outside a quoted string
    If c = " " And _QUOTE = #False
      CurrentPos + FORMAT_BYTE
      Break
    EndIf
    ; Make the Token
    CurrentPos + FORMAT_BYTE
    sTok + c
  ForEver  
  ProcedureReturn sTok
EndProcedure
And the Init section:

Code: Select all

Global.s LINE = "   BAS2H	EQU $2B" ; TAB inside
CurrentPos = 0:PosTok = 1

;*** IMPORTANT TO MULTIPLY HERE ***
LenLine = Len(LINE)*FORMAT_BYTE           ;For ASCII/UNICODE

Debug "Line to tokenize: "+LINE+" - (len:"+Str(LenLine)+")"

While CurrentPos < LenLine
  nbspace = SkipSpace()
  If nbspace > 0                          ;going to the next token
    PosTok + 1
  EndIf
  a$=GetToken()
  If a$ <> ""
    Debug a$+" ["+Str(PosTok)+"]"
    PosTok + 1
  EndIf
  SkipSpace()
Wend
You can see that each time I need to go to the next char, I add to add 1(ASCII) or 2(UNICODE), and multiply by 1 or 2 for the length of the line.

I hope you can find a way to handle that more easily than my messy way :wink:
“Fear is a reaction. Courage is a decision.” - WC
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Removing 'ASCII' switch from PureBasic

Post by wilbert »

flaith wrote:I hope you can find a way to handle that more easily than my messy way :wink:
I hope my answer doesn't pollute this thread too much but maybe an approach like this is less messy.
It works both in ASCII and Unicode mode.

Code: Select all

#CharSize = SizeOf(Character)

Structure CharStructure
  StructureUnion
    c.c
    s.s{1}
  EndStructureUnion
EndStructure

S.s = "String"

*CharPtr.CharStructure = @S

While *CharPtr\c
  
  Debug *CharPtr\s + " value : " + *CharPtr\c
  
  *CharPtr + #CharSize
  
Wend
or (a bit slower compared to the code above)

Code: Select all

Structure CharStructure
  StructureUnion
    c.c
    s.s{1}
  EndStructureUnion
EndStructure

Structure CharArray
  p.CharStructure[0]
EndStructure

S.s = "String"

*CharArray.CharArray = @S

CurrentPos = 0

While *CharArray\p[CurrentPos]\c
  
  Debug *CharArray\p[CurrentPos]\s + " value : " + *CharArray\p[CurrentPos]\c
  
  CurrentPos + 1
  
Wend
Windows (x64)
Raspberry Pi OS (Arm64)
chris319
Enthusiast
Enthusiast
Posts: 782
Joined: Mon Oct 24, 2005 1:05 pm

Re: Removing 'ASCII' switch from PureBasic

Post by chris319 »

What are your thoughts about it ? Is it a deal breaker for you ?
It seems you've already made the decision so why are you soliciting input from the user base after the fact?

It's going to muck up something I'm working on which requires me to pass ASCII strings to and from an API.
ASCII is an old tech and is condamned to disappear sooner or later, as unicode can handle it as well.
So you're legislating obsolescence. All legacy technology is going to be dropped from PureBasic like a hot potato and tough luck to anybody who still uses it, is that the idea?

Is it feasible to ask that mystring$ or mystring.s be a legacy ASCII string and mystring.x* a unicode string, or is the die cast and it's too late to ask for this?

I'm glad to hear it will make things easier for the PB team, to say nothing of the code rewriting the user base will have to do when their code has been broken.

Yes, I'll be looking around for something which isn't built on ever-shifting sands.

*The letter "x" is an arbitrary choice and could be any unused character or symbol deemed appropriate.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Removing 'ASCII' switch from PureBasic

Post by wilbert »

chris319 wrote:I'm glad to hear it will make things easier for the PB team, to say nothing of the code rewriting the user base will have to do when their code has been broken.
Yes, I'll be looking around for something which isn't built on ever-shifting sands.
PureBasic always has evolved this way; adding things and removing things.
You always know a next version isn't guaranteed to be fully backwards compatible. Of course most of it is but there's always the chance you have to change existing code to make it work in a new version or keep using an older compiler next to a newer version.
I'm not saying this is good or bad but to me it is understandable to do it this way if you have so little people working on it and want to keep things manageable.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
flaith
Enthusiast
Enthusiast
Posts: 704
Joined: Mon Apr 25, 2005 9:28 pm
Location: $300:20 58 FC 60 - Rennes
Contact:

Re: Removing 'ASCII' switch from PureBasic

Post by flaith »

Thanks Wilbert :D
“Fear is a reaction. Courage is a decision.” - WC
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: Removing 'ASCII' switch from PureBasic

Post by davido »

I wonder; will this mean the demise of the variable-type .a ?
DE AA EB
User avatar
Shield
Addict
Addict
Posts: 1021
Joined: Fri Jan 21, 2011 8:25 am
Location: 'stralia!
Contact:

Re: Removing 'ASCII' switch from PureBasic

Post by Shield »

davido wrote:I wonder; will this mean the demise of the variable-type .a ?
No. :wink:
Image
Blog: Why Does It Suck? (http://whydoesitsuck.com/)
"You can disagree with me as much as you want, but during this talk, by definition, anybody who disagrees is stupid and ugly."
- Linus Torvalds
NikitaOdnorob98
User
User
Posts: 74
Joined: Fri Jun 29, 2012 4:50 pm

Re: Removing 'ASCII' switch from PureBasic

Post by NikitaOdnorob98 »

Fred, please make a poll. It will be better.

P.S. I think what it's bad idea
User avatar
Danilo
Addict
Addict
Posts: 3037
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Re: Removing 'ASCII' switch from PureBasic

Post by Danilo »

NikitaOdnorob98 wrote:P.S. I think what it's bad idea
Don't you think it could make your life easier, when getting used to it?

You work with english/latin and cyrillic alphabet every day (cyrillic in strings, comments and user interfaces; english/latin for PB keywords).
That's what Unicode is about, supporting Cyrillic, Chinese, Latin, Thai, Korean, ... character sets... all at the same time.
You can have your apps in English, Russian, Chinese, ... just by loading/using a different catalog/database with the strings. No more codepage conversions.

Especially for the Russian guys here I had expected they would welcome the change. Now I see it's the opposite, and it makes me wonder.

The following problem, mentioned by User_Russian, exist only when an application is NOT fully Unicode:
User_Russian wrote:

Code: Select all

DataSection
  IncludeBinary "C:\Программы\Prog.exe"
EndDataSection
Description of error.
[COMPILER] Line 2: Included file Not found: C:\?????????\Prog.exe.
The same error with other Include-commands (IncludeFile, XIncludeFile and IncludePath).
User avatar
luis
Addict
Addict
Posts: 3876
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Removing 'ASCII' switch from PureBasic

Post by luis »

Danilo wrote: Especially for the Russian guys here I had expected they would welcome the change. Now I see it's the opposite, and it makes me wonder.
Welcome the change ?

They can use unicode already. The proposal is not to add unicode for 5.40, is to remove the ability to make ascii builds.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
Danilo
Addict
Addict
Posts: 3037
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Re: Removing 'ASCII' switch from PureBasic

Post by Danilo »

luis wrote:
Danilo wrote:Welcome the change ?
Let me rephrase: Especially for them, Ascii is quite useless, in my opinion. As you can see with the PB compiler,
Ascii applications only create extra trouble (see example "C:\Программы\"). With full Unicode support, you don't
have this problems, and that's what makes Ascii applications pretty obsolete. It is a shame many 3rd party DLLs/libs
are still compiled in Ascii mode, especially in our globalized world.
User avatar
useful
Enthusiast
Enthusiast
Posts: 369
Joined: Fri Jul 19, 2013 7:36 am

Re: Removing 'ASCII' switch from PureBasic

Post by useful »

For those who appreciate in PB cross-platform obviously. In Linux GUI alternatives to standard utf de facto not. To imagine that someone is system software to write on pb difficult. But those who writes something under Linux is unfortunately a little, and the reason is proprietarily pb. So for developers of Cyrillic is not so obvious. Personally I favour of dropping support for ASCII, if the team so it will be easier.

However, I want sequence. I.e. full Unicode support by the compiler in part the names of variables, procedures, and others.
Last edited by useful on Sat Aug 09, 2014 11:23 am, edited 1 time in total.
Dawn will come inevitably.
Little John
Addict
Addict
Posts: 4527
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Removing 'ASCII' switch from PureBasic

Post by Little John »

Shield wrote:
davido wrote:I wonder; will this mean the demise of the variable-type .a ?
No. :wink:
To give some explanation: ;-)
A variable of type .a can hold one whole number in the range from 0 to +255. So the correct name of this variable type is "unsigned byte".
This has nothing got to do with strings in the first place. This variable type was just misnamed in the PB documentation by calling it "ASCII".
Same case with the so called "Unicode" data type, the correct name of which is "unsigned word".
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: Removing 'ASCII' switch from PureBasic

Post by davido »

@Shield,
Thank you.

@Little John,
Thank you for the detailed explanation.
I didn't realise that the variable type was misnamed. :oops: That is why I asked the question.
:D
DE AA EB
Little John
Addict
Addict
Posts: 4527
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Removing 'ASCII' switch from PureBasic

Post by Little John »

Danilo wrote:[...] in our globalized world.
Danilo, I absolutely agree with what you wrote.
However, many people still do not think global, and this is no surprise for me anymore.
We can see this "phenomenon" even on this forum, which is supposed to be international.
Post Reply