Page 1 of 1

Concatenate ASCII letters into a long

Posted: Sun May 17, 2009 1:04 am
by Mistrel
I'm using this function to convert ISO 639-3 three-letter language codes into a single long. It may be useful for other things too. A full list of ISO 639-3 language codes can be found here:

http://www.sil.org/iso639-3/default.asp

Code: Select all

Structure CharField
  c.c[0]
EndStructure

Procedure ConcatenateUAlphaLong(Long.s)
		Protected DigitAscii.l
		Protected Concatenate.l
		Protected StringLength.l=Len(Long.s)
		Protected *CharacterString.CharField=@Long.s
		
		;/ Max digits is 4 ((10 - 1)/2) to support first digit up to 9 where max ucase ascii char is two digits
		If Not StringLength Or StringLength>4
		  ProcedureReturn -1
		EndIf
		
		;/ Convert string to upper-case to use a maximum of two numbers per digit
		Long.s=UCase(Long.s)
		
		;/ Append each ascii result to the end of the integer
		For i=0 To StringLength-1
		  DigitAscii=*CharacterString\c[i]
		  ;/ Only characters A-Z are supported
		  If DigitAscii<65 Or DigitAscii>90
		    ProcedureReturn -2
		  EndIf
	    Concatenate*100
	    Concatenate+DigitAscii
		Next i
		
		ProcedureReturn Concatenate
EndProcedure

String.s="abcd"
Result=ConcatenateUAlphaLong(String.s)

For i=1 To Len(String.s)
  StringAscii.s+Str(Asc(UCase(Mid(String.s,i,1))))+" "
Next i

If Result>0
  Debug StringAscii.s
  Debug Result
Else
  If Result=-1
    Debug "Illegal number of digits."
  ElseIf Result=-2
    Debug "Illegal character in string. Only alphabetical characters are supported."
  EndIf
EndIf

Posted: Sun May 17, 2009 8:04 am
by DoubleDutch
I presume your doing this to make a faster lookup?

Is this not easier?

Code: Select all

Procedure Alpha3ToLong(string$,ucase=#True)
	result=0
	string$=Left(string$,3)
	If ucase
		string$=UCase(string$)
	EndIf
	PokeS(@result,string$,3,#PB_Ascii)
	ProcedureReturn result
EndProcedure


Procedure.s LongToAlpha3(number)
	ProcedureReturn PeekS(@number,3,#PB_Ascii)
EndProcedure

test2=Alpha3ToLong("abc")

Debug(LongToAlpha3(test2))

Posted: Sun May 17, 2009 9:32 am
by Mistrel
PokeS() is for poking a string of characters. Why are you poking it into the address of an integer?

Posted: Sun May 17, 2009 12:23 pm
by DoubleDutch
Because you said what you wanted to do is:
I'm using this function to convert ISO 639-3 three-letter language codes into a single long.
My routine does this shorter and faster.

(There is no reason why you can't poke a 3 digit ascii string into the address of a long)

Here is a routine that does the same job, but packs to a word - just in case you want to save some space...

Code: Select all

Procedure.w PackAlpha3(string$)
	Static table$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
	string$=UCase(string$)
	ProcedureReturn (FindString(table$,Left(string$,1),1)<<10)+(FindString(table$,Mid(string$,2,1),1)<<5)+FindString(table$,Mid(string$,3,1),1)
EndProcedure

Procedure.s UnPackAlpha3(number.w)
	Static table$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
	ProcedureReturn Mid(table$,(number>>10)&$1f,1)+Mid(table$,(number>>5)&$1f,1)+Mid(table$,number&$1f,1)
EndProcedure

test.w=PackAlpha3("xyz")
Debug(UnPackAlpha3(test))

Posted: Sun May 17, 2009 9:41 pm
by Mistrel
I see your logic. Three ASCII characters and a null is four bytes and can therefore fit in a 4-byte long. Your theory works but it only holds three characters (mine holds four) and it's not returning the expected result. Although, drop the null and yours will hold four as well.

However I'm considering you're method is an optimal solution considering retrieving the values would probably be faster.

Code: Select all

Structure CharField
  c.c[0]
EndStructure 

Procedure Alpha3ToLong(string$,ucase=#True)
   result=0
   string$=Left(string$,3)
   If ucase
      string$=UCase(string$)
   EndIf
   PokeS(@result,string$,3,#PB_Ascii)
   ProcedureReturn result
EndProcedure 

Debug Str(Asc("A"))+" "+Str(Asc("B"))+" "+Str(Asc("C"))
This=Alpha3ToLong("ABC")
Debug Str(This)+" <- should be 656667"

*CharField.CharField=@This

Debug Str(*CharField\c[0])+" "+Str(*CharField\c[1])+" "+Str(*CharField\c[2])

Posted: Sun May 17, 2009 9:56 pm
by DoubleDutch
You would have to poke them in as bytes to drop the null, it looks like its always at the end of a PokeS(...

I thought the spec was for 3 characters - not 4? If the data was saved on disk, I'd use the second method to get to a word (3 alphas into 2 bytes) rather than a long. You could always expand them to a long on loading for speed during execution.

Posted: Sun May 17, 2009 10:02 pm
by Mistrel
How can you fit three characters in two bytes?

Posted: Sun May 17, 2009 10:03 pm
by DoubleDutch
Magic! lol...

Try my second example. ;)

Edit: You could make the 2 byte method a lot faster, but I used a table so you can pick a few more characters.

Posted: Mon May 18, 2009 9:16 am
by Demivec
Mistrel wrote:I see your logic. Three ASCII characters and a null is four bytes and can therefore fit in a 4-byte long. Your theory works but it only holds three characters (mine holds four) and it's not returning the expected result. Although, drop the null and yours will hold four as well.
You need to change this part of the code to see the expected results:

Code: Select all

Debug Hex(Asc("A"))+" "+Hex(Asc("B"))+" "+Hex(Asc("C"))
This=Alpha3ToLong("ABC")
Debug Hex(This)+" <- values are reversed when memory is interpreted as a Long"

*CharField.CharField=@This

Debug Hex(*CharField\C[0])+" "+Hex(*CharField\C[1])+" "+Hex(*CharField\C[2])
Here's 2 variations that handles 4 character strings:

Code: Select all

Structure CharField
  C.c[0]
EndStructure

Procedure Alpha3ToLong(string$,ucase=#True)
   Dim result.b(4)
   If ucase
      string$=UCase(string$)
   EndIf
   PokeS(result(),string$,4,#PB_Ascii)
   ProcedureReturn PeekL(result())
EndProcedure

;Procedure Alpha3ToLong(string$,ucase=#True)
;   result.q
;   If ucase
;      string$=UCase(string$)
;   EndIf
;   PokeS(@result + 3,string$,4,#PB_Ascii)
;   ProcedureReturn PeekL(@result + 3)
;EndProcedure

Debug Hex(Asc("A"))+" "+Hex(Asc("B"))+" "+Hex(Asc("C")) + " " + Hex(Asc("D"))
This=Alpha3ToLong("ABCD")
Debug Hex(This)+" <- values are reversed when memory is interpreted as a Long"

*CharField.CharField=@This

Debug Hex(*CharField\C[0])+" "+Hex(*CharField\C[1])+" "+Hex(*CharField\C[2])+" "+Hex(*CharField\C[3])
And here's a shorter variation for DoubleDutch's method that packs 3 letters into 2 bytes:

Code: Select all

Procedure.w PackAlpha3(string$)
   string$=UCase(string$)
   ProcedureReturn (Asc(Right(string$,3)) & $1F) << 10 + (Asc(Right(string$,2)) & $1F) << 5 + (Asc(Right(string$,1)) & $1F)
EndProcedure

Procedure.s UnPackAlpha3(number.w)
   ProcedureReturn Chr((number >> 10) & $1F + $40) + Chr((number >> 5) & $1F + $40) + Chr((number & $1F) + $40)
EndProcedure

Posted: Mon May 18, 2009 10:24 am
by DoubleDutch
Demivec: I said it could be made faster! ;)

Here is an improvement to your variation...

Code: Select all

Procedure.w PackAlpha3(string$) 
   ProcedureReturn (Asc(Right(string$,3)) & $1F) << 10 + (Asc(Right(string$,2)) & $1F) << 5 + (Asc(Right(string$,1)) & $1F) 
EndProcedure 

Procedure.s UnPackAlpha3(number.w) 
   ProcedureReturn Chr((number >> 10) & $1F + $40) + Chr((number >> 5) & $1F + $40) + Chr((number & $1F) + $40) 
EndProcedure

test.w=PackAlpha3("xyz") 
Debug(UnPackAlpha3(test))
Because "a"&$1F and "A"&$1F are the same number - you don't need the ucase.

Could be faster though if made into a macro?

A twelve alpha + number may be useful for a quad?

Posted: Mon May 18, 2009 10:26 am
by Trond
Or simply:

Code: Select all

Long3Chars = PeekL(@String3Chars.s)
Alternatively:

Code: Select all

Structure SCountryCode
  StructureUnion
    AsString.s
    *AsLongPtr.Long ; Read-only!
  EndStructureUnion
EndStructure

MyCountryCode.SCountryCode\AsString = "abc"
Debug MyCountryCode\AsLongPtr\l

Posted: Mon May 18, 2009 10:29 am
by DoubleDutch
Trond: Your right. Thats the fastest if kept as a long.

Posted: Mon May 18, 2009 12:29 pm
by Demivec
DoubleDutch wrote:Because "a"&$1F and "A"&$1F are the same number - you don't need the ucase.
Thanks for catching that. That is what I intended but had forgotten to take it out. :wink: