Removing 'ASCII' switch from PureBasic

luis · Post by **luis** » Sat Aug 16, 2014 10:37 am

wilbert wrote:
chris319 wrote:Look at the output of this silly program, both with the unicode switch on and off:
It outputs both the same as expected. What's your point with this example ?

The suspense is killing me. Still no answer.

Demivec · Post by **Demivec** » Sat Aug 16, 2014 2:39 pm

luis wrote:
wilbert wrote:
chris319 wrote:Look at the output of this silly program, both with the unicode switch on and off:
It outputs both the same as expected. What's your point with this example ?
The suspense is killing me. Still no answer.

I am guessing that chris319 was expecting that when compiled in Unicode that the string would be ASCII because each character added to the string was produced by StrU() using the #PB_Ascii parameter.

The string looks the same when it is outputted through the debugger in either ASCII or Unicode but it differs in the bytes that make up its contents.

That is all very logical and should be common knowledge to all but the few who have never compiled using the Unicode switch. chris319 seems to be one of the 'few'. There are also simple and rudimentary ways of producing ASCII strings when compiling with the Unicode switch. If someone needs to know they can ask.

juror · Post by **juror** » Sat Aug 16, 2014 4:20 pm

Demivec wrote:There are also simple and rudimentary ways of producing ASCII strings when compiling with the Unicode switch. If someone needs to know they can ask.

I had asked if maybe a section or special post could be devoted to the "tips & tricks" associated with converting from ascii to unicode. I don't mind starting such a post if others would use it to post such hints, etc, but would prefer it be a little more formal, if only a sticky somewhere whilst we traverse this conversion.

With bits and pieces being added to multiple posts which also contain a great deal of non-related material/discussion it becomes difficult to find the relevant material.

ideas/suggestions?

Demivec · Post by **Demivec** » Sat Aug 16, 2014 4:53 pm

juror wrote:
Demivec wrote:There are also simple and rudimentary ways of producing ASCII strings when compiling with the Unicode switch. If someone needs to know they can ask.
I had asked if maybe a section or special post could be devoted to the "tips & tricks" associated with converting from ascii to unicode. I don't mind starting such a post if others would use it to post such hints, etc, but would prefer it be a little more formal, if only a sticky somewhere whilst we traverse this conversion.

With bits and pieces being added to multiple posts which also contain a great deal of non-related material/discussion it becomes difficult to find the relevant material.

ideas/suggestions?

A thread can be started where the first posting describes the issues that are dealt with in the thread and a link to where the code appears (later in separate posts). The code to deal with those issues can be spread throughout the thread. I'm not sure which issues would need to be addressed first. A separate thread can be posted to ask that question and get the things that people are needing answered.

luis · Post by **luis** » Sat Aug 16, 2014 5:15 pm

juror wrote:I had asked if maybe a section or special post could be devoted to the "tips & tricks" associated with converting from ascii to unicode.

I like the forum sections as they are (if not too many already) and I don't see the need of a specialized section for migrating from ascii to unicode.
As I don't see the need for a section dedicated to the migration from 32 to 64 bits, or to migrate from PB 3.90 to 4.00 either.
Coding questions and tips & tricks are there for this if one just use the subject field properly.

Creating a dedicated section can be useful for something like ASM programming, a niche with a low volume of posts safely tucked all together (and the isolation doesn't really work that well anyway).

If one wants to create something not really needed a thread like "Migrating from ascii to unicode: questions and tips." should be more then enough.

Usually I'm one of the few "opposing" to this kind of ideas since I find them totally unnecessary if one uses properly what's already here, and I'm pretty confident this post will turn out as totally irrelevant.
So if other people think it can be useful, then by all means do it.

But what there is to tell about migrating from ascii to unicode ?

After you talked about the difference in encoding, about PokeS() and PeekS(), about .c, .a, .u, .Character, .Ascii and .Unicode, about pseudotypes and mentioned why to use SizeOf() ... is there more ?

Enough for one juicy post maybe even if 75% of that it's already in the manual, another 20% can be verified by yourself using ShowMemoryViewer() on a memory buffer and if all this is not enough for the remaining 5% one can still ask questions.

BorisTheOld · Post by **BorisTheOld** » Sat Aug 16, 2014 10:22 pm

luis wrote:After you talked about the difference in encoding, about PokeS() and PeekS(), about .c, .a, .u, .Character, .Ascii and .Unicode, about pseudotypes and mentioned why to use SizeOf() ... is there more ?

Enough for one juicy post maybe even if 75% of that it's already in the manual, another 20% can be verified by yourself using ShowMemoryViewer() on a memory buffer and if all this is not enough for the remaining 5% one can still ask questions.

I agree.

juror · Post by **juror** » Sun Aug 17, 2014 1:49 am

BorisTheOld wrote:
luis wrote:After you talked about the difference in encoding, about PokeS() and PeekS(), about .c, .a, .u, .Character, .Ascii and .Unicode, about pseudotypes and mentioned why to use SizeOf() ... is there more ?

Enough for one juicy post maybe even if 75% of that it's already in the manual, another 20% can be verified by yourself using ShowMemoryViewer() on a memory buffer and if all this is not enough for the remaining 5% one can still ask questions.
I agree.

The gods have spoken. What was I thinking?

luis · Post by **luis** » Sun Aug 17, 2014 10:49 am

juror wrote: The gods have spoken. What was I thinking?

Probably not much. Maybe "I don't want to hear opinions not frantically embracing my idea" if I have to judge by this.

There is no need to be hostile, you are free to do as you please.

plouf · Post by **plouf** » Sun Aug 17, 2014 3:44 pm

luis wrote:
plouf wrote: there is any "hexeditor" unicode compatible
i.e. load exe and modify its non-ascii strings
Any hex editor can modify unicode or ascii strings, they are just bytes (you simply modify one byte every two in unicode).

The problem is in the search function, so the editor must support the search of both ascii and unicode strings.

I use http://mh-nexus.de/en/hxd/ and it does, but I think any decent hex editor does that.

just google "hex editor unicode strings", try some of them and make your choice.

they are bytes but it does NOT display them as readable unicode bytes !

here is an example of a unicode program i make with PB you can see "text" is dot separated and if i modify a unicode text mess happens (have to find numeric equivalent of unicode and type it in number form !?)

http://i59.tinypic.com/2rgdxqo.jpg

furthermore (sorry if it has been posted but i was a little "out" last years"

in pb editor if you switch from utf to ascii national characters mess up ! (if you dont and you have type your program in ascii and compile it in unicode messs string appear)

http://i60.tinypic.com/kf1ms3.jpg

luis · Post by **luis** » Sun Aug 17, 2014 4:16 pm

plouf wrote: they are bytes but it does NOT display them as readable unicode bytes !

You can search (and replace) unicode strings with it, but yes it doesn't decode and show them.

You can decode the bytes to viewable strings by select, copy and paste them in a external tool written in PB, and do the same in the opposite direction to update the binary file with a modified strings (HxD can also insert and remove bytes changing the file length).

Anyway if you are looking to something already integrating that, you can find a comparative list of many hex editors here -> http://en.wikipedia.org/wiki/Comparison_of_hex_editors (some includes unicode support but it's not clear what kind of)

Also you can try google-ing for the ones claiming to offer unicode editing support:

http://www.heaventools.com/flexhex-features.htm

With FlexHEX you can inspect, modify, insert, search, or replace binary, ASCII, or UNICODE data

http://www.morpheussoftware.net/sue/

http://www.funduc.com/fshexedit.htm

etc.

plouf · Post by **plouf** » Sun Aug 17, 2014 5:37 pm

firtsly thanx for you time luis

then to answer myself on this. i have try flexhex,fshxedit , hexedit.js, HxD edit, BE.HexEdit, hhd software free hex edit (more ?)
up to now ONLY Super Unicode Editor from Morpheus seems to work like this !

heartbone · Post by **heartbone** » Sun Aug 17, 2014 7:33 pm

luis wrote:
juror wrote: The gods have spoken. What was I thinking?
Probably not much. Maybe "I don't want to hear opinions not frantically embracing my idea" if I have to judge by this.

There is no need to be hostile, you are free to do as you please.

I do love your extremely witty and spot on response… if it's warranted.
He may not have been using sarcasm, although you probably know a lot better than I his intent.
I was thinking that he actually holds you in high regard, and changed his stance because of your wise post.

infratec · Post by **infratec** » Fri Aug 22, 2014 12:05 pm

Hi,

I just tested how I have to write my programs in the future:

http://www.purebasic.fr/english/viewtop ... 15#p451615

1. It is a lot of more work
2. You have everywhere to use PeekS()
3. You have to modify all structures from a .s{x} to a.[x]
so you can not see anymore if there is a text behind or 'only' bytes
4. Comparissons are really ugly.

As result:

I don't like it.

Bernd

Danilo · Post by **Danilo** » Fri Aug 22, 2014 1:38 pm

infratec wrote:1. It is a lot of more work

Not much after you got used to it. Actually, I'm quite shocked that an old-timer like "infratec"
is still using ASCII mode. OK, maybe 'old' is the buzzword. Stuck in 20th century.
Many PB users already use UNICODE mode exclusively and got used to the style that is required
when using old ASCII-only interfaces.

infratec wrote:2. You have everywhere to use PeekS()
3. You have to modify all structures from a .s{x} to a.[x]
so you can not see anymore if there is a text behind or 'only' bytes
4. Comparissons are really ugly.

Not if you connect to modern libraries and APIs that support UNICODE directly.
The conversion is only required if you connect to old-style, last-century libs,
that don't support UNICODE.

The big advantage is, you can now write programs for the whole world (7+ billion people),
and you are not limited to your small local world anymore.

Seriously, infratec:
I set UNICODE mode as Default years ago, and it is really not a big problem to get used to it.
Of course I made some mistakes at the beginning of the transition, but it really wasn't a big problem.
Writing PB libs/includes for customers, I check that everything is working with ASCII and UNICODE mode,
and it just works most of the time... because I got used to it already years ago.
To be honest, if you code ASCII and 32bit only, you are late already. Today everything is about ASCII vs. UNICODE,
and 32bit vs. 64bit. If you are still stuck in 32bit ASCII world, it is your own fault. Seriously, UNICODE is as important
as supporting 64bit mode. That's called progress.

infratec · Post by **infratec** » Fri Aug 22, 2014 2:02 pm

But as you can see in that example:

ASCII is used in file formats which are still common and in use.
And if you have to write a program which have to deal with this you have no other chance.

I'm not a windows programmer.
I write programs which normally has todo with microcontrollers
and communication stuff.

Have you ever seen a web page which contains unicode ?
I only know html which uses UTF-8 as coding (like this page too

)

For me it makes no sense to use unicode.
The programs which I'm writing are in german or englisch.
I have only one (private) program which uses different languages (a freeware called GuzziDiag)
and since this program is crossplatform, I use also UTF-8 for the language file.

Have I already mentioned that I don't like windows ?

But I follow the decision of Fred and the team, even when I don't like it

It was only to show how ugly it is if you have to handle ASCII stuff without native support for it.

Bernd

PureBasic Forums - English

Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic