Concatenate ASCII letters into a long

Share your advanced PureBasic knowledge/code with the community.
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Concatenate ASCII letters into a long

Post by Mistrel »

I'm using this function to convert ISO 639-3 three-letter language codes into a single long. It may be useful for other things too. A full list of ISO 639-3 language codes can be found here:

http://www.sil.org/iso639-3/default.asp

Code: Select all

Structure CharField
  c.c[0]
EndStructure

Procedure ConcatenateUAlphaLong(Long.s)
		Protected DigitAscii.l
		Protected Concatenate.l
		Protected StringLength.l=Len(Long.s)
		Protected *CharacterString.CharField=@Long.s
		
		;/ Max digits is 4 ((10 - 1)/2) to support first digit up to 9 where max ucase ascii char is two digits
		If Not StringLength Or StringLength>4
		  ProcedureReturn -1
		EndIf
		
		;/ Convert string to upper-case to use a maximum of two numbers per digit
		Long.s=UCase(Long.s)
		
		;/ Append each ascii result to the end of the integer
		For i=0 To StringLength-1
		  DigitAscii=*CharacterString\c[i]
		  ;/ Only characters A-Z are supported
		  If DigitAscii<65 Or DigitAscii>90
		    ProcedureReturn -2
		  EndIf
	    Concatenate*100
	    Concatenate+DigitAscii
		Next i
		
		ProcedureReturn Concatenate
EndProcedure

String.s="abcd"
Result=ConcatenateUAlphaLong(String.s)

For i=1 To Len(String.s)
  StringAscii.s+Str(Asc(UCase(Mid(String.s,i,1))))+" "
Next i

If Result>0
  Debug StringAscii.s
  Debug Result
Else
  If Result=-1
    Debug "Illegal number of digits."
  ElseIf Result=-2
    Debug "Illegal character in string. Only alphabetical characters are supported."
  EndIf
EndIf
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

I presume your doing this to make a faster lookup?

Is this not easier?

Code: Select all

Procedure Alpha3ToLong(string$,ucase=#True)
	result=0
	string$=Left(string$,3)
	If ucase
		string$=UCase(string$)
	EndIf
	PokeS(@result,string$,3,#PB_Ascii)
	ProcedureReturn result
EndProcedure


Procedure.s LongToAlpha3(number)
	ProcedureReturn PeekS(@number,3,#PB_Ascii)
EndProcedure

test2=Alpha3ToLong("abc")

Debug(LongToAlpha3(test2))
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Post by Mistrel »

PokeS() is for poking a string of characters. Why are you poking it into the address of an integer?
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

Because you said what you wanted to do is:
I'm using this function to convert ISO 639-3 three-letter language codes into a single long.
My routine does this shorter and faster.

(There is no reason why you can't poke a 3 digit ascii string into the address of a long)

Here is a routine that does the same job, but packs to a word - just in case you want to save some space...

Code: Select all

Procedure.w PackAlpha3(string$)
	Static table$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
	string$=UCase(string$)
	ProcedureReturn (FindString(table$,Left(string$,1),1)<<10)+(FindString(table$,Mid(string$,2,1),1)<<5)+FindString(table$,Mid(string$,3,1),1)
EndProcedure

Procedure.s UnPackAlpha3(number.w)
	Static table$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
	ProcedureReturn Mid(table$,(number>>10)&$1f,1)+Mid(table$,(number>>5)&$1f,1)+Mid(table$,number&$1f,1)
EndProcedure

test.w=PackAlpha3("xyz")
Debug(UnPackAlpha3(test))
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Post by Mistrel »

I see your logic. Three ASCII characters and a null is four bytes and can therefore fit in a 4-byte long. Your theory works but it only holds three characters (mine holds four) and it's not returning the expected result. Although, drop the null and yours will hold four as well.

However I'm considering you're method is an optimal solution considering retrieving the values would probably be faster.

Code: Select all

Structure CharField
  c.c[0]
EndStructure 

Procedure Alpha3ToLong(string$,ucase=#True)
   result=0
   string$=Left(string$,3)
   If ucase
      string$=UCase(string$)
   EndIf
   PokeS(@result,string$,3,#PB_Ascii)
   ProcedureReturn result
EndProcedure 

Debug Str(Asc("A"))+" "+Str(Asc("B"))+" "+Str(Asc("C"))
This=Alpha3ToLong("ABC")
Debug Str(This)+" <- should be 656667"

*CharField.CharField=@This

Debug Str(*CharField\c[0])+" "+Str(*CharField\c[1])+" "+Str(*CharField\c[2])
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

You would have to poke them in as bytes to drop the null, it looks like its always at the end of a PokeS(...

I thought the spec was for 3 characters - not 4? If the data was saved on disk, I'd use the second method to get to a word (3 alphas into 2 bytes) rather than a long. You could always expand them to a long on loading for speed during execution.
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Post by Mistrel »

How can you fit three characters in two bytes?
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

Magic! lol...

Try my second example. ;)

Edit: You could make the 2 byte method a lot faster, but I used a table so you can pick a few more characters.
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

Mistrel wrote:I see your logic. Three ASCII characters and a null is four bytes and can therefore fit in a 4-byte long. Your theory works but it only holds three characters (mine holds four) and it's not returning the expected result. Although, drop the null and yours will hold four as well.
You need to change this part of the code to see the expected results:

Code: Select all

Debug Hex(Asc("A"))+" "+Hex(Asc("B"))+" "+Hex(Asc("C"))
This=Alpha3ToLong("ABC")
Debug Hex(This)+" <- values are reversed when memory is interpreted as a Long"

*CharField.CharField=@This

Debug Hex(*CharField\C[0])+" "+Hex(*CharField\C[1])+" "+Hex(*CharField\C[2])
Here's 2 variations that handles 4 character strings:

Code: Select all

Structure CharField
  C.c[0]
EndStructure

Procedure Alpha3ToLong(string$,ucase=#True)
   Dim result.b(4)
   If ucase
      string$=UCase(string$)
   EndIf
   PokeS(result(),string$,4,#PB_Ascii)
   ProcedureReturn PeekL(result())
EndProcedure

;Procedure Alpha3ToLong(string$,ucase=#True)
;   result.q
;   If ucase
;      string$=UCase(string$)
;   EndIf
;   PokeS(@result + 3,string$,4,#PB_Ascii)
;   ProcedureReturn PeekL(@result + 3)
;EndProcedure

Debug Hex(Asc("A"))+" "+Hex(Asc("B"))+" "+Hex(Asc("C")) + " " + Hex(Asc("D"))
This=Alpha3ToLong("ABCD")
Debug Hex(This)+" <- values are reversed when memory is interpreted as a Long"

*CharField.CharField=@This

Debug Hex(*CharField\C[0])+" "+Hex(*CharField\C[1])+" "+Hex(*CharField\C[2])+" "+Hex(*CharField\C[3])
And here's a shorter variation for DoubleDutch's method that packs 3 letters into 2 bytes:

Code: Select all

Procedure.w PackAlpha3(string$)
   string$=UCase(string$)
   ProcedureReturn (Asc(Right(string$,3)) & $1F) << 10 + (Asc(Right(string$,2)) & $1F) << 5 + (Asc(Right(string$,1)) & $1F)
EndProcedure

Procedure.s UnPackAlpha3(number.w)
   ProcedureReturn Chr((number >> 10) & $1F + $40) + Chr((number >> 5) & $1F + $40) + Chr((number & $1F) + $40)
EndProcedure
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

Demivec: I said it could be made faster! ;)

Here is an improvement to your variation...

Code: Select all

Procedure.w PackAlpha3(string$) 
   ProcedureReturn (Asc(Right(string$,3)) & $1F) << 10 + (Asc(Right(string$,2)) & $1F) << 5 + (Asc(Right(string$,1)) & $1F) 
EndProcedure 

Procedure.s UnPackAlpha3(number.w) 
   ProcedureReturn Chr((number >> 10) & $1F + $40) + Chr((number >> 5) & $1F + $40) + Chr((number & $1F) + $40) 
EndProcedure

test.w=PackAlpha3("xyz") 
Debug(UnPackAlpha3(test))
Because "a"&$1F and "A"&$1F are the same number - you don't need the ucase.

Could be faster though if made into a macro?

A twelve alpha + number may be useful for a quad?
Last edited by DoubleDutch on Mon May 18, 2009 10:27 am, edited 1 time in total.
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Or simply:

Code: Select all

Long3Chars = PeekL(@String3Chars.s)
Alternatively:

Code: Select all

Structure SCountryCode
  StructureUnion
    AsString.s
    *AsLongPtr.Long ; Read-only!
  EndStructureUnion
EndStructure

MyCountryCode.SCountryCode\AsString = "abc"
Debug MyCountryCode\AsLongPtr\l
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

Trond: Your right. Thats the fastest if kept as a long.
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

DoubleDutch wrote:Because "a"&$1F and "A"&$1F are the same number - you don't need the ucase.
Thanks for catching that. That is what I intended but had forgotten to take it out. :wink:
Post Reply