PureBasic Forums - English

Posted: **Tue Aug 12, 2014 2:17 pm**

BorisTheOld wrote:After reading all the above comments I can't see what all the fuss is about.

In the 50+ years that I've been programming there have always been conflicts between internal and external data representation, and programmers have always had to work around them.

Second generation systems, right up to the early 1970s, typically used bytes of 6-bits. For example, the IBM 1401 and 1620 systems used four data bits, a word mark bit, and a check bit. Text required 2 bytes per character and numeric data used 1 byte per decimal digit.

When the IBM 360 arrived in 1964, not only were internal data formats completely different, every single program had to be re-written. Characters now used 1 byte and internal data formats could be binary or BCD. Programmers like myself wrote programs to simulate the 2nd generation hardware, in order to allow the old executable files to be used while programs were re-written. Some of these simulaters were in use for over 10 years.

Then when Windows, VB, and Unicode arrived, systems had to be converted from DOS to Windows. This required the new programs to correctly read and write Ascii file data, so programmers devised methods for handling the I/O. And it wasn't rocket science.

The latest changes to PB are just part of the evolutionary process that has been happening to software since computers were invented. And having to adapt to such changes is just part of being a programmer. It's not the end of the world. There are simple strategies for making the transition to Unicode, while at the same time handling Ascii data from external sources.

It's not a big issue.

I agree that it is not a big issue like this one is.
However, I think that the fuss here is that no one wants to change things unnecessarily.
For some the changes may be substantial, therefore fuss is generated.
That is part of our human nature, and I know that you already know it Boris, and you were merely making a rhetorical comment.
Which sometimes can be sure fire debate bait, but not this time.
My main point to you here is that since you mentioned IBM,
I am truly surprised that you didn't have any reference to EBCDIC in your wise remarks on this subject.

Posted: **Tue Aug 12, 2014 2:21 pm**

bbanelli wrote:I am having an issue with my PCI-Z software.

By activating Unicode mode and moving all PeekS functions to proper #PB_Ascii parameter, when compiling in Debug mode, software works fine.

Can you show the prototypes or import blocks passing strings to and from the dll?

Posted: **Tue Aug 12, 2014 4:28 pm**

Little John wrote:Here is a simple module for working with ASCII strings, even when the program is compiled in Unicode mode.
Maybe it is useful for someone.
I wrote this in a short time, it can be done in a more sophisticated way, of course.

Code: Select all

; PB 5.22 LTS

DeclareModule ASCIIZ
   ; -- Null-terminated ASCII strings, even when the program is compiled in Unicode mode
   
   Macro Get (_addr_)
      PeekS(_addr_, -1, #PB_Ascii)
   EndMacro
   
   Declare   Put (*var.Integer, s$)
   Declare.i Length (addr.i)
   Declare   Free (addr.i)
   Declare   FreeAll ()
EndDeclareModule


Module ASCIIZ
   EnableExplicit
   
   Structure ASCIIZ
      *Var.Integer
      Len.i
   EndStructure
   
   Global NewMap String.ASCIIZ()
   
   
   Procedure Put (*var.Integer, s$)
      Protected.i length, new
      
      length = Len(s$)
      If *var\i = #Null Or MemorySize(*var\i) <= length
         new = ReAllocateMemory(*var\i, length+1)
         If new
            DeleteMapElement(String(), Str(*var\i))
            *var\i = new
            String(Str(*var\i))\Var = *var
            String()\Len = length
            PokeS(*var\i, s$, -1, #PB_Ascii)
         EndIf
      Else
         String(Str(*var\i))\Len = length
         PokeS(*var\i, s$, -1, #PB_Ascii)
      EndIf   
   EndProcedure
   
   
   Procedure.i Length (addr.i)
      ProcedureReturn String(Str(addr))\Len 
   EndProcedure
   
   
   Procedure Free (addr.i)
      FreeMemory(addr)
      String(Str(addr))\Var\i = 0
      DeleteMapElement(String())
   EndProcedure
   
   
   Procedure FreeAll ()
      ForEach String()
         FreeMemory(Val(MapKey(String())))
         String()\Var\i = 0
      Next
      ClearMap(String())
   EndProcedure
EndModule


CompilerIf #PB_Compiler_IsMainFile
   ;-- Module demo
   
   EnableExplicit
   
   Define.i a, b, c
   
   ASCIIZ::Put(@a, "Hello World")
   Debug "'" + ASCIIZ::Get(a) + "'"
   Debug "Length = " + ASCIIZ::Length(a)
   Debug ""
   
   ASCIIZ::Put(@a, "Hello")
   Debug "'" + ASCIIZ::Get(a) + "'"
   Debug "Length = " + ASCIIZ::Length(a)
   Debug ""
   
   ASCIIZ::Put(@a, "A somewhat longer text, used for testing.")
   Debug "'" + ASCIIZ::Get(a) + "'"
   Debug "Length = " + ASCIIZ::Length(a)
   Debug ""
   
   ASCIIZ::Put(@b, "This is another string.")
   Debug "'" + ASCIIZ::Get(b) + "'"
   Debug "Length = " + ASCIIZ::Length(b)
   Debug ""
   
   ASCIIZ::Put(@c, ASCIIZ::Get(a) + " " + ASCIIZ::Get(b))
   Debug "'" + ASCIIZ::Get(c) + "'"
   Debug "Length = " + ASCIIZ::Length(c)
   Debug ""
   
   Debug a
   ASCIIZ::Free(a)
   Debug a
   Debug ""
   
   Debug b
   Debug c
   ASCIIZ::FreeAll()
   Debug b
   Debug c
CompilerEndIf

Thanks for sharing this!
I ran it in 5.3, and on the third PUT, I got an error on the realloc line, "Trying to free or to reallocate a non-allocated memory block". I do not understand why; there seems to be a valid pointer in *var\i.

Posted: **Tue Aug 12, 2014 4:38 pm**

Tenaja, you are welcome!

Tenaja wrote:I ran it in 5.3, and on the third PUT, I got an error on the realloc line, "Trying to free or to reallocate a non-allocated memory block". I do not understand why; there seems to be a valid pointer in *var\i.

Please try with the Purifier switched off.

There is a Purifier bug with ReAllocateMemory() in PB 5.30, which raises exactly this error message.

Posted: **Tue Aug 12, 2014 6:25 pm**

Little John wrote:Please try with the Purifier switched off.

Thanks...I have only used the Purifier once, and it was so long ago I forgot it was on. All is good now. Thanks again!

Posted: **Tue Aug 12, 2014 8:02 pm**

Code: Select all

Structure AsciiArr : a.a[0] : EndStructure

Length = 1024

*Buffer.Ascii = AllocateMemory(Length)

    Debug PeekS(*pointer, -1, #PB_Ascii)
   
    PokeS(*pointer, "Hello World", -1, #PB_Ascii)

Do you really think all these calls to AllocateMemory(), PokeS() and PeekS() and maintaining pointers make for more elegant code? Seriously? It's a butt-ugly hack IMO.

What's wrong with:

string.s and string$ are ASCII

string.n is unicode?

I've floated this idea a couple of times now but it seems to have gone over everyone's head.

Posted: **Tue Aug 12, 2014 8:16 pm**

chris319 wrote: What's wrong with:

string.s and string$ are ASCII

string.n is unicode?

This breaks many codes!
string.s and string$ in most cases unicode

on my pc

Posted: **Tue Aug 12, 2014 9:14 pm**

chris319 wrote:
Code: Select all
Structure AsciiArr : a.a[0] : EndStructure

Length = 1024

*Buffer.Ascii = AllocateMemory(Length)

    Debug PeekS(*pointer, -1, #PB_Ascii)
   
    PokeS(*pointer, "Hello World", -1, #PB_Ascii)
Do you really think all these calls to AllocateMemory(), PokeS() and PeekS() and maintaining pointers make for more elegant code? Seriously? It's a butt-ugly hack IMO.

It just shows that you can still work with ASCII data, accessing ASCII characters in memory.
That's nothing new, you can do this for years.

I use UNICODE exclusively for some years, and I don't need to use such stuff.
I don't know what your specific problems are, but my codes compile with ASCII
and UNICODE without changing a thing, most of the time.
Even when working with low-level pointers, it is not a problem to access the data,
when using things like "sizeof(Character)" etc.

I don't have problems working with ASCII and UNICODE strings, because I am used to it.
It is you, who seems to have problems with UNICODE strings, because you think a character
is represented by 7 or 8 Bits.
In real-life, the adaption is not a big problem, as guys like BorisTheOld already said.
After all, it is your own fault if you are still stuck in Win95 time.

Within the UNICODE world you don't need such conversions and low-level data access.
Things like p-ascii and PokeS/PeekS are required to work mostly with 3rd party libs/APIs that still
are not available for UNICODE. Libs from last century.

chris319 wrote:I've floated this idea a couple of times now but it seems to have gone over everyone's head.

You are one of very few people that still don't get it: The decision has been made already, and it has been officially announced.

Adapt to it or do whatever you like, but you will not change anything anymore. Your ideas simply ignore that most people
in our globalized world already use UNICODE. Millions of developers use UNICODE exclusively, every day.

Posted: **Tue Aug 12, 2014 9:34 pm**

chris319 wrote:Do you really think all these calls to AllocateMemory(), PokeS() and PeekS() and maintaining pointers make for more elegant code? Seriously? It's a butt-ugly hack IMO.

What's wrong with:

string.s and string$ are ASCII

string.n is unicode?

I've floated this idea a couple of times now but it seems to have gone over everyone's head.

That is a fine idea, and I would support it. Unfortunately, Fred has decided he will no longer support ascii strings due to the maintenance demands. As a compromise, Little John's module gives people a way to deal with ascii strings after they are amputated from PB. So, no, it is not pretty compared to a "native" library, it is a functional way to deal with it.

Within a PB application, unicode is no more difficult than ascii. I converted an 80k+ line project in a relatively short amount of time. (A utf-8 dll requires no conversion for ascii, but it does for unicode--and that is where all the time was spent.) The real issues come up when you are dealing with ascii based hardware (i.e. serial ports connected to devices that send/receive only ascii), files that are written and/or read by other programs that can only handle ascii, and other libraries that do not handle unicode (i.e. the aforementioned dll).

Posted: **Tue Aug 12, 2014 10:38 pm**

ts-soft wrote: This breaks many codes!
string.s and string$ in most cases unicode on my pc

In mine $ is both, and yes this would break it -> http://www.purebasic.fr/english/viewtop ... 75#p450675

Posted: **Thu Aug 14, 2014 10:40 am**

You are one of very few people that still don't get it: The decision has been made already, and it has been officially announced.

The decision is to eliminate the unicode compiler switch. That's wonderful and it works better for the PB "team". I'm proposing a way to continue to support users who need ASCII to interface with something external to PB that isn't an ugly hack as you and Fred and others have proposed. Those users need ASCII because something else needs it; the choice of ASCII is out of their hands. That's the concept that has gone WAY over your and everyone else's heads. People are so focused on "I", "me", "my" and "my application" and "I use unicode" that they're not thinking outside of their own little worlds.

Fred has decided he will no longer support ascii strings due to the maintenance demands.

Again, they have decided to eliminate the unicode compiler switch. Fred has proposed some ugly-looking and ambiguous functions to convert between unicode and ASCII. The functionality will still be there, another fact that is way over people's heads. If Fred and company want to make PB uglier, it's their prerogative.

This breaks many codes!
string.s and string$ in most cases unicode on my pc

"I", "me", "my" and "my PC". You're not the only user of PB, dear.

OK then have string.s and string$ be unicode and string.x be ASCII. Simple. Happy now?

No matter what they do, elegant or butt-ugly, something will have to be rewritten if you are a user who needs ASCII, no two ways about it.

That is a fine idea, and I would support it. Unfortunately, Fred has decided

The "team" has made a unilateral decision without first obtaining input from the users nor a) considering that some users still need ASCII to interface with something external to PB, or b) thinking of better solutions.

Look at the output of this silly program, both with the unicode switch on and off:

Code: Select all

myString$ = ""
myString$ + StrU(65,#PB_Ascii)
myString$ + StrU(83,#PB_Ascii)
myString$ + StrU(67,#PB_Ascii)
myString$ + StrU(73,#PB_Ascii)
myString$ + StrU(73,#PB_Ascii)
Debug myString$

Posted: **Thu Aug 14, 2014 11:45 am**

chris319 wrote:Look at the output of this silly program, both with the unicode switch on and off:

It outputs both the same as expected. What's your point with this example ?

Creating libraries that support both Ascii and Unicode is a lot of work so I understand the decision that has been made.
What might be convenient is that pseudotypes like p-ascii and p-utf8 can be used in more occasions as currently supported so that PB would do an automatic conversion to and from unicode.

Posted: **Thu Aug 14, 2014 3:31 pm**

chris319 wrote:The "team" has made a unilateral decision without first obtaining input from the users

That is plain wrong. Get your facts first, please.

Posted: **Sat Aug 16, 2014 8:56 am**

a little bit late reply

howver asking if anyone knows if there is any "hexeditor" unicode compatible

i.e. load exe and modify its non-ascii strings

would be necessary to me at least

Posted: **Sat Aug 16, 2014 10:25 am**

plouf wrote: there is any "hexeditor" unicode compatible
i.e. load exe and modify its non-ascii strings

Any hex editor can modify unicode or ascii strings, they are just bytes (you simply modify one byte every two in unicode).

The problem is in the search function, so the editor must support the search of both ascii and unicode strings.

I use http://mh-nexus.de/en/hxd/ and it does, but I think any decent hex editor does that.

just google "hex editor unicode strings", try some of them and make your choice.

PureBasic Forums - English

Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic