Removing 'ASCII' switch from PureBasic

c4s · Post by **c4s** » Sun Aug 10, 2014 12:57 pm

chris319 wrote:
c4s wrote:
chris319 wrote:The mathematical concepts of Log() and Sqr() are centuries old and are thus old technology. I hope the PureBasic team doesn't abandon those.
Wow, I have no words. Are you serious? Unfortunately the "arguments" in this thread are getting worse and worse...
It's a joke.

Don't worry, I got that.

My point was that comparing the removal of Ascii-only compilation with fundamental mathematical concepts is wrong on so many levels that it doesn't make any sense (even as a joke).

chris319 · Post by **chris319** » Mon Aug 11, 2014 12:28 am

We all seem to agree on two things:

1. ASCII strings must not be abandoned entirely because some applications require them.

2. Users need control over whether their strings are ASCII or unicode.

Here is a proposed solution:

We have a somewhat similar* situation with character variables where, without a compiler switch, myChar.c will always be unicode. In order to have it be ASCII one would use myChar.a. The programmer still has control over what kind of variable it is.

Why not do something similar with strings?

myString.s and myString$ are always ASCII strings.

myString.n is a unicode string (second letter in uNicode -- ".u" is not available)

It's not ugly like:
*AsciiBuffer = ToAscii(String$)
*UTF8Buffer = ToUTF8(String$)

The above example is ambiguous. String$ could be either ASCII or unicode, "depending". My idea is unambiguous. myString.n is always unicode. myString$ and myString.s are always ASCII. There is no ambiguity.

Without a unicode switch, myChar.c will always be unicode with no ambiguity. That's an advancement. Any code relying on it being compiled as ASCII will have to be rewritten with myChar.a. Any string written as myString$ or myString.s and made unicode at compile time would have to be rewritten as myString.n

*In reality, .a and .c are not character variables; they are numeric variables. You can't do:

myChar.a = "x", or myChar.b = "y" or myChar.c = "z"

But you can do:

myChar.c = 123, assigning a numeric value to a character. Presently, if myChar.c is compiled as unicode you can do myChar.c = 65000. If it is not compiled as unicode, you cannot.

BorisTheOld · Post by **BorisTheOld** » Mon Aug 11, 2014 1:05 am

After reading all the above comments I can't see what all the fuss is about.

In the 50+ years that I've been programming there have always been conflicts between internal and external data representation, and programmers have always had to work around them.

Second generation systems, right up to the early 1970s, typically used bytes of 6-bits. For example, the IBM 1401 and 1620 systems used four data bits, a word mark bit, and a check bit. Text required 2 bytes per character and numeric data used 1 byte per decimal digit.

When the IBM 360 arrived in 1964, not only were internal data formats completely different, every single program had to be re-written. Characters now used 1 byte and internal data formats could be binary or BCD. Programmers like myself wrote programs to simulate the 2nd generation hardware, in order to allow the old executable files to be used while programs were re-written. Some of these simulaters were in use for over 10 years.

Then when Windows, VB, and Unicode arrived, systems had to be converted from DOS to Windows. This required the new programs to correctly read and write Ascii file data, so programmers devised methods for handling the I/O. And it wasn't rocket science.

The latest changes to PB are just part of the evolutionary process that has been happening to software since computers were invented. And having to adapt to such changes is just part of being a programmer. It's not the end of the world. There are simple strategies for making the transition to Unicode, while at the same time handling Ascii data from external sources.

It's not a big issue.

Didelphodon · Post by **Didelphodon** » Mon Aug 11, 2014 2:40 am

just fore the sake of completeness ...
if you like to set the font of a scintilla gadget (via styles) and your program is unicode based you still have to keep the fontname in zero terminated ascii

marc_256 · Post by **marc_256** » Mon Aug 11, 2014 10:53 am

Fred wrote: edit: before freaking out, we are just talking about removing the "unicode switch", not all ascii related operations !

The Fantaisie Software Team

Not sure, I did not understand this, I need all ascii controls for my RS232 controlled machinery.
So, if I understand well, there is no problem for that ??
Else, PB5.22 and 5.23 LST will still use ascii ??

thanks,
Marc

IdeasVacuum · Post by **IdeasVacuum** » Mon Aug 11, 2014 12:30 pm

Hi Marc

.....RS-232 is gradually disappearing, even from industrial machines. Apparently RS-232 does not in any case define the character encoding to be used, so that is defined by the device (no doubt you have a handbook for the device in-out).

So, given the age of the device design, sounds like you only need the standard 128 characters? In which case you don't need any conversion functions. Wiki: UTF-8 uses one byte for any ASCII character, all of which have the same code values in both UTF-8 and ASCII encoding.

infratec · Post by **infratec** » Mon Aug 11, 2014 2:10 pm

@IdeasVacuum
You are right, but...
Fred is talking about unicode and not UTF-8, so he need conversions.

@Marc
You will have to use very often the #PB_ASCII flag.
But it will work.
You can test this already, simply enable the 'Unicode executable' flag in compiler options.

Bernd

Danilo · Post by **Danilo** » Mon Aug 11, 2014 4:10 pm

The ability to work with data buffers is still there, for input and output. Memory buffers can contain
any data you want, including Byte data and Ascii characters.

Code: Select all

Structure AsciiArr : a.a[0] : EndStructure

Length = 1024
*Buffer.Ascii = AllocateMemory(Length)

*pointer.AsciiArr = *buffer
If *pointer
    *pointer\a[0] = 'A'
    *pointer\a[1] = 'B'
    *pointer\a[2] = 'C'
    
    Debug PeekS(*pointer, -1, #PB_Ascii)
    
    PokeS(*pointer, "Hello World", -1, #PB_Ascii)
    
    For i = 0 To 12
        Debug " Dec: " + *pointer\a[i] +
              " Hex: " + Hex( *pointer\a[i] ) +
              " Chr: " + Chr( *pointer\a[i] )
    Next
EndIf

CompilerIf 0

;--------------------------------------------------------------------------------------------------------------
;  Data functions                                      ;  String functions
;--------------------------------------------------------------------------------------------------------------
;                                                      ;
;                                                      ;
; Serial Port                                          ;
;                                                      ;
WriteSerialPortData(#SerialPort, *Buffer, Length)      ; WriteSerialPortString(#SerialPort, String$ [, Format])
ReadSerialPortData (#SerialPort, *Buffer, Length)      ;
;--------------------------------------------------------------------------------------------------------------
;                                                      ;
;                                                      ;
; Process                                              ;
;                                                      ;
WriteProgramData(Program, *Buffer, Length)             ; PB 5.2x: WriteProgramString (Program, String$)   --  optional [, Format] missing (use data functions)
                                                       ;          WriteProgramStringN(Program, String$)   --  optional [, Format] missing (use data functions)
                                                       ; PB 5.3+: WriteProgramString (Program, String$ [, Flags])
                                                       ;          WriteProgramStringN(Program, String$ [, Flags])
                                                       ;                                                       
ReadProgramData (Program, *Buffer, Length)             ; PB 5.2x: ReadProgramString(Program)             --  optional [, Format] missing (use data functions)
                                                       ; PB 5.3+: ReadProgramString(Program [, Flags])
;--------------------------------------------------------------------------------------------------------------
;                                                      ;
; Network                                              ;
;                                                      ;
SendNetworkData   (Connection, *Buffer, Length)        ; SendNetworkString(Connection, String$ [, Format])
ReceiveNetworkData(Connection, *Buffer, Length)        ;
;--------------------------------------------------------------------------------------------------------------
;                                                      ;
; File                                                 ;
;                                                      ;
WriteData(#File, *Buffer, Length)                      ; WriteString (#File, Text$ [, Format])
                                                       ; WriteStringN(#File, Text$ [, Format])
ReadData (#File, *Buffer, Length)                      ; ReadString  (#File [, Flags [, Length]])
;--------------------------------------------------------------------------------------------------------------
;                                                      ;
; Console                                              ;
;                                                      ;
WriteConsoleData(*Buffer, Length)                      ; Print (Text$)                          --  optional [, Format] missing (use data functions)
                                                       ; PrintN(Text$)                          --  optional [, Format] missing (use data functions)
ReadConsoleData (*Buffer, Length)                      ; String$ = Input()                      --  optional [, Format] missing (use data functions)
;--------------------------------------------------------------------------------------------------------------

CompilerEndIf

luciano · Post by **luciano** » Mon Aug 11, 2014 4:56 pm

Danilo is right,
I use PB to write programs to control machines using Rs232, Rs485, or Rs422 and I compile them as "unicode executable" since I have interfaces in different languages.
I too save log files for microcontrollers in 8 bit; I can confirm that they all work fine if you use the correct parameters

Post by **Fred** » Mon Aug 11, 2014 5:02 pm

@Danilo: since 5.30, Read/WriteProgramString() now have the format parameters as well

Little John · Post by **Little John** » Mon Aug 11, 2014 6:08 pm

BorisTheOld wrote:It's not a big issue.

Generally speaking, I agree with what you wrote.
However, there are still some bugs when compiling in Unicode mode.
These bugs should be fixed before ASCII mode is dropped, otherwise PB users will not be amused.

Danilo · Post by **Danilo** » Mon Aug 11, 2014 6:42 pm

Fred wrote:@Danilo: since 5.30, Read/WriteProgramString() now have the format parameters as well

Thanks, changed it in the table. (I'm still using LTS, of course with Unicode

)

Little John · Post by **Little John** » Mon Aug 11, 2014 11:51 pm

//edit 2016-06-08:
Transmogrified the code, and moved it to the "Tricks 'n' Tips" section
http://www.purebasic.fr/english/viewtop ... 12&t=65905

bbanelli · Post by **bbanelli** » Tue Aug 12, 2014 9:45 am

Little John wrote:
BorisTheOld wrote:It's not a big issue.
Generally speaking, I agree with what you wrote.
However, there are still some bugs when compiling in Unicode mode.

Speaking off, I am having an issue with my PCI-Z software.

By activating Unicode mode and moving all PeekS functions to proper #PB_Ascii parameter, when compiling in Debug mode, software works fine.

However, when I compile executable and run it, I get this.

Code: Select all

Problem signature:
  Problem Event Name:	APPCRASH
  Application Name:	PCI-Z.exe
  Application Version:	1.3.0.1
  Application Timestamp:	53e9d317
  Fault Module Name:	MSVCRT.dll
  Fault Module Version:	7.0.7601.17744
  Fault Module Timestamp:	4eeaf722
  Exception Code:	c0000005
  Exception Offset:	0001d33d
  OS Version:	6.1.7601.2.1.0.256.48
  Locale ID:	1050
  Additional Information 1:	0a9e
  Additional Information 2:	0a9e372d3b4ad19135b953a78882e789
  Additional Information 3:	0a9e
  Additional Information 4:	0a9e372d3b4ad19135b953a78882e789

Read our privacy statement online:
  http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline:
  C:\Windows\system32\en-US\erofflps.txt

As I have both CLI and GUI mode in this software, I am sure that the bug isn't in the part of accessing PCI devices or getting proper ID's from compressed database since CLI output works properly, but rather somewhere in the GUI part itself.

I am aware that this is maybe something for "Problem" section of the forum, but since this problem seems to be directly related to Unicode, perhaps someone has an insight? Windows 7 and PB 5.30 x86, but it also happens with x64 version.

jassing · Post by **jassing** » Tue Aug 12, 2014 2:07 pm

What about things like existing databases? (SQLite, etc)

PureBasic Forums - English

Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic