strange beginning of Unicode String..

Everything else that doesn't fall into one of the other PB categories.
c4s
Addict
Addict
Posts: 1981
Joined: Thu Nov 01, 2007 5:37 pm
Location: Germany

strange beginning of Unicode String..

Post by c4s »

First of all: I don't know too much about unicode ;).

Anyway I realized that unicode strings normally begin with $FFFE. But now I have one that has $FF00FE at the beginning and I don't understand what this means.

Any ideas?
If any of you native English speakers have any suggestions for the above text, please let me know (via PM). Thanks!
Edwin Knoppert
Addict
Addict
Posts: 1073
Joined: Fri Apr 25, 2003 11:13 pm
Location: Netherlands
Contact:

Re: strange beginning of Unicode String..

Post by Edwin Knoppert »

It is a file header for unicode or other code types.
You can for example use notepad to open this file and use save as to check the current filetype.

The 2 a 3 bytes are in specific order and value to inidicate the specific type.
Old fashion ansifiles don't have this header.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Re: strange beginning of Unicode String..

Post by pdwyer »

Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: strange beginning of Unicode String..

Post by Trond »

Unicode strings don't have any special beginning.

$FFFE is at the start of some unicode files. It means the file uses UCS2 encoding (the same as PB uses for unicode strings).

$FF00FE is something different entirely. It probably isn't a unicode file, or it's a unicode file without a byte order mark. Or you simply read the value wrong, and it's really $FF FE 00 00 or $00 00 FE FF, which means UTF-32.
c4s
Addict
Addict
Posts: 1981
Joined: Thu Nov 01, 2007 5:37 pm
Location: Germany

Re: strange beginning of Unicode String..

Post by c4s »

pdwyer wrote:not seeing yours

http://en.wikipedia.org/wiki/Byte-order_mark
That's why I'm confused.
Trond wrote:Unicode strings don't have any special beginning.

$FFFE is at the start of some unicode files. It means the file uses UCS2 encoding (the same as PB uses for unicode strings).

$FF00FE is something different entirely. It probably isn't a unicode file, or it's a unicode file without a byte order mark. Or you simply read the value wrong, and it's really $FF FE 00 00 or $00 00 FE FF, which means UTF-32.
To be more precise: The string I'm reading is a ID3-Tag. The tag has a unicode flag so the whole string must be in unicode as far as I know.
So normally the unicode tags I found began with $FFFE but now I found a mp3 file where every tag has $FF00FE at the beginning.

I really don't know what this means.
If any of you native English speakers have any suggestions for the above text, please let me know (via PM). Thanks!
DarkDragon
Addict
Addict
Posts: 2348
Joined: Mon Jun 02, 2003 9:16 am
Location: Germany
Contact:

Re: strange beginning of Unicode String..

Post by DarkDragon »

c4s wrote:
pdwyer wrote:not seeing yours

http://en.wikipedia.org/wiki/Byte-order_mark
That's why I'm confused.
Trond wrote:Unicode strings don't have any special beginning.

$FFFE is at the start of some unicode files. It means the file uses UCS2 encoding (the same as PB uses for unicode strings).

$FF00FE is something different entirely. It probably isn't a unicode file, or it's a unicode file without a byte order mark. Or you simply read the value wrong, and it's really $FF FE 00 00 or $00 00 FE FF, which means UTF-32.
To be more precise: The string I'm reading is a ID3-Tag. The tag has a unicode flag so the whole string must be in unicode as far as I know.
So normally the unicode tags I found began with $FFFE but now I found a mp3 file where every tag has $FF00FE at the beginning.

I really don't know what this means.
Well ID3 is messed up by a lot of editors (including windows media player - it doesn't safe the tagsize of APIC as syncsafe integer). Maybe its not your fault, but the fault of the editor you have used for the file.
bye,
Daniel
akj
Enthusiast
Enthusiast
Posts: 668
Joined: Mon Jun 09, 2003 10:08 pm
Location: Nottingham

Re: strange beginning of Unicode String..

Post by akj »

I'm working from memory, but I'm fairly sure I've seen this used to determine whether the file uses Little-Endian or big-Endian byte ordering.
Anthony Jordan
c4s
Addict
Addict
Posts: 1981
Joined: Thu Nov 01, 2007 5:37 pm
Location: Germany

Re: strange beginning of Unicode String..

Post by c4s »

Hm, I just found out that the following will handle the string as I need it:

Code: Select all

Content.s = PeekS(*MemID, Size, #PB_UTF8)
...Because it will display the "best" with (not) unicode-executable and (not) unicode-string, right?
At least it can handle $FF00FE and stuff.
If any of you native English speakers have any suggestions for the above text, please let me know (via PM). Thanks!
Post Reply