Page 1 of 1

strange beginning of Unicode String..

Posted: Thu Sep 24, 2009 9:31 pm
by c4s
First of all: I don't know too much about unicode ;).

Anyway I realized that unicode strings normally begin with $FFFE. But now I have one that has $FF00FE at the beginning and I don't understand what this means.

Any ideas?

Re: strange beginning of Unicode String..

Posted: Thu Sep 24, 2009 10:12 pm
by Edwin Knoppert
It is a file header for unicode or other code types.
You can for example use notepad to open this file and use save as to check the current filetype.

The 2 a 3 bytes are in specific order and value to inidicate the specific type.
Old fashion ansifiles don't have this header.

Re: strange beginning of Unicode String..

Posted: Fri Sep 25, 2009 8:13 am
by pdwyer

Re: strange beginning of Unicode String..

Posted: Fri Sep 25, 2009 9:50 am
by Trond
Unicode strings don't have any special beginning.

$FFFE is at the start of some unicode files. It means the file uses UCS2 encoding (the same as PB uses for unicode strings).

$FF00FE is something different entirely. It probably isn't a unicode file, or it's a unicode file without a byte order mark. Or you simply read the value wrong, and it's really $FF FE 00 00 or $00 00 FE FF, which means UTF-32.

Re: strange beginning of Unicode String..

Posted: Fri Sep 25, 2009 10:10 am
by c4s
pdwyer wrote:not seeing yours

http://en.wikipedia.org/wiki/Byte-order_mark
That's why I'm confused.
Trond wrote:Unicode strings don't have any special beginning.

$FFFE is at the start of some unicode files. It means the file uses UCS2 encoding (the same as PB uses for unicode strings).

$FF00FE is something different entirely. It probably isn't a unicode file, or it's a unicode file without a byte order mark. Or you simply read the value wrong, and it's really $FF FE 00 00 or $00 00 FE FF, which means UTF-32.
To be more precise: The string I'm reading is a ID3-Tag. The tag has a unicode flag so the whole string must be in unicode as far as I know.
So normally the unicode tags I found began with $FFFE but now I found a mp3 file where every tag has $FF00FE at the beginning.

I really don't know what this means.

Re: strange beginning of Unicode String..

Posted: Fri Sep 25, 2009 10:18 am
by DarkDragon
c4s wrote:
pdwyer wrote:not seeing yours

http://en.wikipedia.org/wiki/Byte-order_mark
That's why I'm confused.
Trond wrote:Unicode strings don't have any special beginning.

$FFFE is at the start of some unicode files. It means the file uses UCS2 encoding (the same as PB uses for unicode strings).

$FF00FE is something different entirely. It probably isn't a unicode file, or it's a unicode file without a byte order mark. Or you simply read the value wrong, and it's really $FF FE 00 00 or $00 00 FE FF, which means UTF-32.
To be more precise: The string I'm reading is a ID3-Tag. The tag has a unicode flag so the whole string must be in unicode as far as I know.
So normally the unicode tags I found began with $FFFE but now I found a mp3 file where every tag has $FF00FE at the beginning.

I really don't know what this means.
Well ID3 is messed up by a lot of editors (including windows media player - it doesn't safe the tagsize of APIC as syncsafe integer). Maybe its not your fault, but the fault of the editor you have used for the file.

Re: strange beginning of Unicode String..

Posted: Fri Sep 25, 2009 5:41 pm
by akj
I'm working from memory, but I'm fairly sure I've seen this used to determine whether the file uses Little-Endian or big-Endian byte ordering.

Re: strange beginning of Unicode String..

Posted: Fri Sep 25, 2009 5:43 pm
by c4s
Hm, I just found out that the following will handle the string as I need it:

Code: Select all

Content.s = PeekS(*MemID, Size, #PB_UTF8)
...Because it will display the "best" with (not) unicode-executable and (not) unicode-string, right?
At least it can handle $FF00FE and stuff.