More Sophisticated Lower Case function

Just starting out? Need help? Post your questions and find answers here.
CalamityJames
User
User
Posts: 81
Joined: Sat Mar 13, 2010 4:50 pm

More Sophisticated Lower Case function

Post by CalamityJames »

I have a long list of words which I want to search for a matching non-case-sensitive string. The very fast Pure basic Lcase() function can't be used because I want É to be found if I search for "e" etc. Furthermore I want to find ae (two letters) if I search for Æ etc and ss if search for ß. To this end I wrote the code below and it is reasonably satisfactory, but I wonder if anyone can improve on it. There is a very fast lower case routine here: http://www.purebasic.fr/german/viewtopi ... 32#p248832 which can do everything except substitute two letters for one. I have no idea if it could be adapted to do this.
Note: You should turn off the debugger when running this code or it will take a very long time!

Code: Select all

EnableExplicit
Global Inc.l, FirstTick, Word.s, EventId.l, TicksTaken.l
Global LcaseAsciiFixed.s{256}

Procedure CreateLowerCaseCharString()
Protected Inc.l, LCaseAsciiStr.s  
For Inc = 1 To 44
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + " " ; replaces hyphen by space
For Inc = 46 To 64
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "abcdefghijklmnopqrstuvwxyz"
For Inc = 91 To 137
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next  
LCaseAsciiStr = LCaseAsciiStr + "s‹œ" + Chr(141) + "z"
For Inc = 143 To 144
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "''" + Chr(34) + Chr(34)
For Inc = 149 To 153
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "s"
For Inc = 155 To 157
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "zy ¡¢£¤y" ; sticky space to space
For Inc = 166 To 191
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "aaaaaaæceeeeiiiidnooooo×ouuuuyÞßaaaaaaæceeeeiiiiðnooooo÷ouuuuyþy"
LcaseAsciiFixed = " " + LcaseAsciiStr
EndProcedure

Procedure.s MakeLowerCase(theStr.s)
Protected AscChar.a, StrEnd.l, Inc.l
StrEnd = Len(theStr) - 1
For Inc = 0 To StrEnd
  AscChar = PeekA(@LcaseAsciiFixed + PeekA(@theStr + Inc))
  Select AscChar
    Case 1 To 155, 157 To 222, 224 To 229, 231 To 255
      PokeA(@theStr + Inc, AscChar)
    Case 156 ; œ  
      PokeA(@theStr + Inc, 111)
      StrEnd + 1: Inc + 1
      theStr = InsertString(theStr, "e", Inc + 1) 
    Case 223 ; ß 
      PokeA(@theStr + Inc, 115)
      StrEnd + 1: Inc + 1
      theStr = InsertString(theStr, "s", Inc + 1) 
    Case 230 ; æ
      PokeA(@theStr + Inc, 97)
      StrEnd + 1: Inc + 1
      theStr = InsertString(theStr, "e", Inc + 1) 
  EndSelect
Next
ProcedureReturn theStr
EndProcedure

CreateLowerCaseCharString()
Word = "ÆXAMPLE ßTRING"

If OpenWindow(0, 20, 20, 400, 300, "Lower Case") 
  TextGadget(0, 20, 20, 250, 18, "Original String: " + Word)
  TextGadget(1, 20, 40, 250, 18, "Lower case String: " + MakeLowerCase(Word))
  TextGadget(2, 20, 60, 250, 18, "")
  FirstTick = GetTickCount_()
  For Inc = 1 To 1000000
    MakeLowerCase(Word)  
  Next 
  TicksTaken = GetTickCount_() - FirstTick
  SetGadgetText(2, "Time Taken: " + Str(TicksTaken))
  Repeat
    EventId = WaitWindowEvent()
  Until EventId = #PB_Event_CloseWindow
  CloseWindow(0)  
EndIf
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: More Sophisticated Lower Case function

Post by Little John »

Here is another approach. Please tell me whether it is faster or slower than your solution. :-)

//edit 2013-05-10, 15:20
small speed improvement

Code: Select all

; tested with PB 5.11 x86 in ASCII and Unicode mode

EnableExplicit

Procedure.i CreateTranslationTable (Map trans$())
   ; * adapt this procedure to your needs *
   ; out: trans$()    : translation table
   ;      return value: maximum number of characters that a value in trans$() has
   Protected i.i
   
   ; normal ASCII uppercase characters -> lowercase
   For i = 'A' To 'Z'
      trans$(Chr(i)) = Chr(i+32)
   Next
   ; special characters
   trans$("É") = "e"
   trans$("Æ") = "ae"
   trans$("ß") = "ss"
   
   ProcedureReturn 2
EndProcedure


Procedure.s Translate (source$, Map trans$(), maxLen.i)
   ; -- main function: replace all desired characters in source$
   ; in : source$, trans$(): Each character in source$ that is a map key in trans$()
   ;                         will be replaced with the corresponding map value;
   ;                         all other characters in source$ remain unchanged.
   ;                         All map keys in trans$() must consist only of 1 character.
   ;      maxLen           : maximum number of characters that a value in trans$() has
   ; out: return value: source$ "translated"
   Protected Dim buffer.c(maxLen*Len(source$)+1)
   Protected *found.String, *in.Character=@source$, *out.Character=@buffer()
   
   While *in\c
      *found = FindMapElement(trans$(), Chr(*in\c))  ; replace the existing character
      If *found
         *out = CopyMemoryString(@*found\s, @*out)
      Else                                           ; just copy the existing character
         *out\c = *in\c
         *out + SizeOf(Character)
      EndIf
      *in + SizeOf(Character)
   Wend
   
   ProcedureReturn PeekS(@buffer())
EndProcedure


;-- Demo
Define maxLen.i, source$
NewMap trans$()

maxLen = CreateTranslationTable(trans$())
source$ = "A small ÆXAMPLÉ ßTRING"
Debug "*" + Translate(source$, trans$(), maxLen) + "*"
Last edited by Little John on Fri May 10, 2013 2:23 pm, edited 2 times in total.
CalamityJames
User
User
Posts: 81
Joined: Sat Mar 13, 2010 4:50 pm

Re: More Sophisticated Lower Case function

Post by CalamityJames »

Thanks for your reply, Little John. I've put your suggestion to the test and it turns out it's a little slower. The example I posted takes about 1900 ticks and yours was about 2600 on my laptop. However, as I'm unfamilar with maps I'm now looking at your code closely to see what's going on.
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: More Sophisticated Lower Case function

Post by Little John »

Hi,

now I've taken a closer look at your code and done some tests, too. I can confirm that your code is faster than mine. However, your code does not work in Unicode mode. With my above code, it's no problem to replace e.g. the character with Unicode code point say 10000. The cost of this flexibility is a loss of speed.

When the range of characters which can occur in the strings is small (this is always the case in ASCII mode, and might also be the case in Unicode mode), then I'd use an Array instead of a Map as translation table for the sake of speed, like in the following code. It is as fast as your code, but in contrast to it also runs in Unicode mode. This code is also more flexible, since there are no hard coded "Select / Case" statements.

Code: Select all

; tested with PB 5.11 x86 in ASCII and Unicode mode

EnableExplicit

Procedure.i CreateTranslationTable (Array trans$(1))
   ; * adapt this procedure to your needs *
   ; out: trans$()    : translation table
   ;      return value: maximum number of characters that a value in trans$() has
   Protected i.i, last.i=ArraySize(trans$())
   
   ; initialize
   For i = 1 To last
      trans$(i) = Chr(i)
   Next
   
   ; normal ASCII uppercase characters -> lowercase
   For i = 'A' To 'Z'
      trans$(i) = Chr(i+32)
   Next
   ; special characters
   trans$(Asc("É")) = "e"
   trans$(Asc("Æ")) = "ae"
   trans$(Asc("ß")) = "ss"
   
   ProcedureReturn 2
EndProcedure


Procedure.s Translate (source$, Array trans$(1), maxLen.i)
   ; -- main function: replace all desired characters in source$
   ; in : source$ : Each character in source$ will be replaced with the corresponding
   ;                value of trans$().
   ;      trans$(): must contain entries for the whole range of characters that can
   ;                occur in source$
   ;      maxLen  : maximum number of characters that a value in trans$() has
   ; out: return value: source$ "translated"
   Protected numChars.i, totalChars.i=0
   Protected ret$=Space(maxLen*Len(source$))
   Protected *s.Character=@source$, *r.Character=@ret$
   
   While *s\c
      numChars = Len(trans$(*s\c))
      PokeS(*r, trans$(*s\c), numChars)
      *r + numChars*SizeOf(Character)
      *s + SizeOf(Character)
      totalChars + numChars
   Wend
   
   ProcedureReturn Left(ret$, totalChars)
EndProcedure


;-- Demo
Define maxLen.i, source$
Dim trans$(255)

maxLen = CreateTranslationTable(trans$())
source$ = "A small ÆXAMPLÉ ßTRING"
Debug "*" + Translate(source$, trans$(), maxLen) + "*"
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: More Sophisticated Lower Case function

Post by wilbert »

A little modified version; should be a bit faster.
If you would know in advance the maximum size of a source string, using a static buffer is much faster.
And if you know already how you are going to compile your application (x86 or x64, ascii or unicode) you can get it even faster by using a bit of assembler but I believe it already is pretty fast.

Code: Select all

; tested with PB 5.11 x86 in ASCII and Unicode mode

EnableExplicit

Procedure.i CreateTranslationTable (Array trans$(1))
   ; * adapt this procedure to your needs *
   ; out: trans$()    : translation table
   ;      return value: maximum number of characters that a value in trans$() has
   Protected i.i, last.i=ArraySize(trans$())
   
   ; initialize
   For i = 1 To last
      trans$(i) = Chr(i)
   Next
   
   ; normal ASCII uppercase characters -> lowercase
   For i = 'A' To 'Z'
      trans$(i) = Chr(i+32)
   Next
   ; special characters
   trans$(Asc("É")) = "e"
   trans$(Asc("Æ")) = "ae"
   trans$(Asc("ß")) = "ss"
   
   ProcedureReturn 2
EndProcedure


Procedure.s Translate (source$, Array trans$(1), maxLen.i)
   ; -- main function: replace all desired characters in source$
   ; in : source$ : Each character in source$ will be replaced with the corresponding
   ;                value of trans$().
   ;      trans$(): must contain entries for the whole range of characters that can
   ;                occur in source$
   ;      maxLen  : maximum number of characters that a value in trans$() has
   ; out: return value: source$ "translated"
   Dim buffer.c(maxLen * Len(source$))
   Protected *in.Character = @source$, *out = @buffer()
   CopyMemoryString(*out, @*out)
   While *in\c
     CopyMemoryString(@trans$(*in\c))
     *in + SizeOf(Character)
   Wend
   ProcedureReturn PeekS(@buffer())
EndProcedure


;-- Demo
Define maxLen.i, source$
Dim trans$(255)

maxLen = CreateTranslationTable(trans$())
source$ = "A small ÆXAMPLÉ ßTRING"
Debug "*" + Translate(source$, trans$(), maxLen) + "*"
Windows (x64)
Raspberry Pi OS (Arm64)
CalamityJames
User
User
Posts: 81
Joined: Sat Mar 13, 2010 4:50 pm

Re: More Sophisticated Lower Case function

Post by CalamityJames »

Thanks to Little John for the his latest post. I've done a comparison of my original and this version, and with my example string "ÆXAMPLE ßTRING" I found that his was slightly (about 6% - 10%), but consistently, faster. However, it depends on the string. His is faster substituting the "special" characters and slower substituting the standard characters so in some situations it will be slower. I've done a bit more work on my version and I'll be comparing that with Wilbert's contribution as soon as I can.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: More Sophisticated Lower Case function

Post by wilbert »

Here's also an ASM version you can try

Code: Select all

EnableExplicit
Global Inc.l, FirstTick, Word.s, EventId.l, TicksTaken.l

Procedure.s MakeLowerCase(*String)
  Dim buffer.c(MemoryStringLength(*String) << 1 + 1)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ecx, [p.p_String]
    !mov edx, [p.a_buffer]
    !lcase_translation_loop:
    CompilerIf #PB_Compiler_Unicode
      !movzx eax, word [ecx]
      !add ecx, 2
      !test ah, ah
      !jnz lcase_translation_cont1
      !mov ax, [lcase_translation_table + eax * 2]
      !test ah, ah
      !jz lcase_translation_cont2
      !mov [edx], al
      !mov [edx + 2], ah
      !add edx, 4
      !jmp lcase_translation_loop
      !lcase_translation_cont1:
      !mov [edx], ax
      !add edx, 2
      !jmp lcase_translation_loop
      !lcase_translation_cont2:
      !mov [edx], al
      !add edx, 2
    CompilerElse
      !movzx eax, byte [ecx]
      !inc ecx
      !mov ax, [lcase_translation_table + eax * 2] 
      !mov [edx], ax
      !neg ah
      !adc edx, 1
    CompilerEndIf
  CompilerElse
    !lea r8, [lcase_translation_table]
    !mov rcx, [p.p_String]
    !mov rdx, [p.a_buffer]
    !lcase_translation_loop:
    CompilerIf #PB_Compiler_Unicode
      !movzx eax, word [rcx]
      !add rcx, 2
      !test ah, ah
      !jnz lcase_translation_cont1
      !mov ax, [r8 + rax * 2]
      !test ah, ah
      !jz lcase_translation_cont2
      !mov [rdx], al
      !mov [rdx + 2], ah
      !add rdx, 4
      !jmp lcase_translation_loop
      !lcase_translation_cont1:
      !mov [rdx], ax
      !add rdx, 2
      !jmp lcase_translation_loop
      !lcase_translation_cont2:
      !mov [rdx], al
      !add rdx, 2
    CompilerElse
      !movzx eax, byte [rcx]
      !inc rcx
      !mov ax, [r8 + rax * 2] 
      !mov [rdx], ax
      !neg ah
      !adc rdx, 1
    CompilerEndIf
  CompilerEndIf
  !test al, al
  !jnz lcase_translation_loop
  ProcedureReturn PeekS(@buffer())
  !align 16
  !lcase_translation_table:
  !dw 0x00,0x01,0x02,0x03,0x04,0x05,0x06  ,0x07,0x08,0x09,0x0A,0x0B,0x0C  ,0x0D,0x0E,0x0F
  !dw 0x10,0x11,0x12,0x13,0x14,0x15,0x16  ,0x17,0x18,0x19,0x1A,0x1B,0x1C  ,0x1D,0x1E,0x1F
  !dw 0x20,0x21,0x22,0x23,0x24,0x25,0x26  ,0x27,0x28,0x29,0x2A,0x2B,0x2C  ,0x20,0x2E,0x2F
  !dw 0x30,0x31,0x32,0x33,0x34,0x35,0x36  ,0x37,0x38,0x39,0x3A,0x3B,0x3C  ,0x3D,0x3E,0x3F
  !dw 0x40,0x61,0x62,0x63,0x64,0x65,0x66  ,0x67,0x68,0x69,0x6A,0x6B,0x6C  ,0x6D,0x6E,0x6F
  !dw 0x70,0x71,0x72,0x73,0x74,0x75,0x76  ,0x77,0x78,0x79,0x7A,0x5B,0x5C  ,0x5D,0x5E,0x5F
  !dw 0x60,0x61,0x62,0x63,0x64,0x65,0x66  ,0x67,0x68,0x69,0x6A,0x6B,0x6C  ,0x6D,0x6E,0x6F
  !dw 0x70,0x71,0x72,0x73,0x74,0x75,0x76  ,0x77,0x78,0x79,0x7A,0x7B,0x7C  ,0x7D,0x7E,0x7F
  !dw 0x80,0x81,0x82,0x83,0x84,0x85,0x86  ,0x87,0x88,0x89,0x73,0x8B,0x656F,0x8D,0x7A,0x8F
  !dw 0x90,0x27,0x27,0x22,0x22,0x95,0x96  ,0x97,0x98,0x99,0x73,0x9B,0x656F,0x9D,0x7A,0x79
  !dw 0x20,0xA1,0xA2,0xA3,0xA4,0x79,0xA6  ,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC  ,0xAD,0xAE,0xAF
  !dw 0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6  ,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC  ,0xBD,0xBE,0xBF
  !dw 0x61,0x61,0x61,0x61,0x61,0x61,0x6561,0x63,0x65,0x65,0x65,0x65,0x69  ,0x69,0x69,0x69
  !dw 0x64,0x6E,0x6F,0x6F,0x6F,0x6F,0x6F  ,0xD7,0x6F,0x75,0x75,0x75,0x75  ,0x79,0xDE,0x7373
  !dw 0x61,0x61,0x61,0x61,0x61,0x61,0x6561,0x63,0x65,0x65,0x65,0x65,0x69  ,0x69,0x69,0x69
  !dw 0xF0,0x6E,0x6F,0x6F,0x6F,0x6F,0x6F  ,0xF7,0x6F,0x75,0x75,0x75,0x75  ,0x79,0xFE,0x79
EndProcedure

Word = "ÆXAMPLE ßTRING"

If OpenWindow(0, 20, 20, 400, 300, "Lower Case") 
  TextGadget(0, 20, 20, 250, 18, "Original String: " + Word)
  TextGadget(1, 20, 40, 250, 18, "Lower case String: " + MakeLowerCase(@Word))
  TextGadget(2, 20, 60, 250, 18, "")
  FirstTick = GetTickCount_()
  For Inc = 1 To 1000000
    MakeLowerCase(@Word)  
  Next 
  TicksTaken = GetTickCount_() - FirstTick
  SetGadgetText(2, "Time Taken: " + Str(TicksTaken))
  Repeat
    EventId = WaitWindowEvent()
  Until EventId = #PB_Event_CloseWindow
  CloseWindow(0)  
EndIf
Here's a version with a static buffer that can be used if the source string has less than 16384 characters.

Code: Select all

EnableExplicit
Global Inc.l, FirstTick, Word.s, EventId.l, TicksTaken.l

Procedure.s MakeLowerCase_S(*String)
  Static Dim buffer.c(16384)
  Protected *buffer = @buffer()
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ecx, [p.p_String]
    !mov edx, [p.p_buffer]
    !lcase_s_translation_loop:
    CompilerIf #PB_Compiler_Unicode
      !movzx eax, word [ecx]
      !add ecx, 2
      !test ah, ah
      !jnz lcase_s_translation_cont1
      !mov ax, [lcase_s_translation_table + eax * 2]
      !test ah, ah
      !jz lcase_s_translation_cont2
      !shl eax, 8
      !shr ax, 8
      !mov [edx], eax
      !add edx, 4
      !jmp lcase_s_translation_loop
      !lcase_s_translation_cont1:
      !mov [edx], ax
      !add edx, 2
      !jmp lcase_s_translation_loop
      !lcase_s_translation_cont2:
      !mov [edx], ax
      !add edx, 2
    CompilerElse
      !movzx eax, byte [ecx]
      !inc ecx
      !mov ax, [lcase_s_translation_table + eax * 2] 
      !mov [edx], ax
      !neg ah
      !adc edx, 1
    CompilerEndIf
  CompilerElse
    !lea r8, [lcase_s_translation_table]
    !mov rcx, [p.p_String]
    !mov rdx, [p.p_buffer]
    !lcase_s_translation_loop:
    CompilerIf #PB_Compiler_Unicode
      !movzx eax, word [rcx]
      !add rcx, 2
      !test ah, ah
      !jnz lcase_s_translation_cont1
      !mov ax, [r8 + rax * 2]
      !test ah, ah
      !jz lcase_s_translation_cont2
      !shl eax, 8
      !shr ax, 8
      !mov [rdx], eax
      !add rdx, 4
      !jmp lcase_s_translation_loop
      !lcase_s_translation_cont1:
      !mov [rdx], ax
      !add rdx, 2
      !jmp lcase_s_translation_loop
      !lcase_s_translation_cont2:
      !mov [rdx], ax
      !add rdx, 2
    CompilerElse
      !movzx eax, byte [rcx]
      !inc rcx
      !mov ax, [r8 + rax * 2] 
      !mov [rdx], ax
      !neg ah
      !adc rdx, 1
    CompilerEndIf
  CompilerEndIf
  !test al, al
  !jnz lcase_s_translation_loop
  ProcedureReturn PeekS(@buffer())
  !align 16
  !lcase_s_translation_table:
  !dw 0x00,0x01,0x02,0x03,0x04,0x05,0x06  ,0x07,0x08,0x09,0x0A,0x0B,0x0C  ,0x0D,0x0E,0x0F
  !dw 0x10,0x11,0x12,0x13,0x14,0x15,0x16  ,0x17,0x18,0x19,0x1A,0x1B,0x1C  ,0x1D,0x1E,0x1F
  !dw 0x20,0x21,0x22,0x23,0x24,0x25,0x26  ,0x27,0x28,0x29,0x2A,0x2B,0x2C  ,0x20,0x2E,0x2F
  !dw 0x30,0x31,0x32,0x33,0x34,0x35,0x36  ,0x37,0x38,0x39,0x3A,0x3B,0x3C  ,0x3D,0x3E,0x3F
  !dw 0x40,0x61,0x62,0x63,0x64,0x65,0x66  ,0x67,0x68,0x69,0x6A,0x6B,0x6C  ,0x6D,0x6E,0x6F
  !dw 0x70,0x71,0x72,0x73,0x74,0x75,0x76  ,0x77,0x78,0x79,0x7A,0x5B,0x5C  ,0x5D,0x5E,0x5F
  !dw 0x60,0x61,0x62,0x63,0x64,0x65,0x66  ,0x67,0x68,0x69,0x6A,0x6B,0x6C  ,0x6D,0x6E,0x6F
  !dw 0x70,0x71,0x72,0x73,0x74,0x75,0x76  ,0x77,0x78,0x79,0x7A,0x7B,0x7C  ,0x7D,0x7E,0x7F
  !dw 0x80,0x81,0x82,0x83,0x84,0x85,0x86  ,0x87,0x88,0x89,0x73,0x8B,0x656F,0x8D,0x7A,0x8F
  !dw 0x90,0x27,0x27,0x22,0x22,0x95,0x96  ,0x97,0x98,0x99,0x73,0x9B,0x656F,0x9D,0x7A,0x79
  !dw 0x20,0xA1,0xA2,0xA3,0xA4,0x79,0xA6  ,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC  ,0xAD,0xAE,0xAF
  !dw 0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6  ,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC  ,0xBD,0xBE,0xBF
  !dw 0x61,0x61,0x61,0x61,0x61,0x61,0x6561,0x63,0x65,0x65,0x65,0x65,0x69  ,0x69,0x69,0x69
  !dw 0x64,0x6E,0x6F,0x6F,0x6F,0x6F,0x6F  ,0xD7,0x6F,0x75,0x75,0x75,0x75  ,0x79,0xDE,0x7373
  !dw 0x61,0x61,0x61,0x61,0x61,0x61,0x6561,0x63,0x65,0x65,0x65,0x65,0x69  ,0x69,0x69,0x69
  !dw 0xF0,0x6E,0x6F,0x6F,0x6F,0x6F,0x6F  ,0xF7,0x6F,0x75,0x75,0x75,0x75  ,0x79,0xFE,0x79
EndProcedure

Word = "ÆXAMPLE ßTRING"

If OpenWindow(0, 20, 20, 400, 300, "Lower Case") 
  TextGadget(0, 20, 20, 250, 18, "Original String: " + Word)
  TextGadget(1, 20, 40, 250, 18, "Lower case String: " + MakeLowerCase_S(@Word))
  TextGadget(2, 20, 60, 250, 18, "")
  FirstTick = GetTickCount_()
  For Inc = 1 To 1000000
    MakeLowerCase_S(@Word)  
  Next 
  TicksTaken = GetTickCount_() - FirstTick
  SetGadgetText(2, "Time Taken: " + Str(TicksTaken))
  Repeat
    EventId = WaitWindowEvent()
  Until EventId = #PB_Event_CloseWindow
  CloseWindow(0)  
EndIf
Last edited by wilbert on Fri May 10, 2013 10:34 am, edited 1 time in total.
Windows (x64)
Raspberry Pi OS (Arm64)
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: More Sophisticated Lower Case function

Post by Little John »

wilbert wrote:A little modified version; should be a bit faster.
On my PC, this version is 1.5 times as fast as my version. Cool, thank you!
CalamityJames
User
User
Posts: 81
Joined: Sat Mar 13, 2010 4:50 pm

Re: More Sophisticated Lower Case function

Post by CalamityJames »

Here is my best to date. It uses ascii characters 1 to 3 as flags (so they can't be made lower case, which is not really a problem since I'm working with real words all the time) and fixed arrays rather that strings for the working. It passes a pointer rather than a string to the lower case function. Compiled on my laptop (a 32 bit machine) the program takes around 450 milliseconds to run. If a variable is used to capture the result (something I need to do and which was an oversight in my original posted program) then the time taken goes out to 750 milliseconds. Rather to my surprise this is faster than Little John's where the comparable figures are 750 and 1000. I'd be interested to know if these figures are replicated by anyone. The Lcase() PureBasic function takes 750 milliseconds.

Code: Select all

EnableExplicit
Global Inc.l, FirstTick, Word.s, EventId.l, TicksTaken.l, lcaseWord.s
Global Dim LcaseRefArray.a(255)
Global Dim LcArray.a(128)

Procedure CreateLowerCaseCharString()
Protected Inc.l, LCaseAsciiStr.s  
For Inc = 1 To 44
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + " " ; replaces hyphen by space
For Inc = 46 To 64
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "abcdefghijklmnopqrstuvwxyz"
For Inc = 91 To 137
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next  
LCaseAsciiStr = LCaseAsciiStr + "s‹œ" + Chr(141) + "z"
For Inc = 143 To 144
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "''" + Chr(34) + Chr(34)
For Inc = 149 To 153
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "s"
For Inc = 155 To 157
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = LCaseAsciiStr + "zy ¡¢£¤y" ; sticky space to space
For Inc = 166 To 191
  LCaseAsciiStr = LCaseAsciiStr + Chr(Inc)
Next
LCaseAsciiStr = " " +  LCaseAsciiStr + "aaaaaaæceeeeiiiidnooooo×ouuuuyÞßaaaaaaæceeeeiiiiðnooooo÷ouuuuyþy"
MoveMemory(@LCaseAsciiStr, @LcaseRefArray(0), 256)
LcaseRefArray(0) = 0 ; Chr(0)
LcaseRefArray(140) = 1 ; Œ 
LcaseRefArray(156) = 1 ; Œ 
LcaseRefArray(198) = 2 ; Æ 
LcaseRefArray(230) = 2 ; æ 
LcaseRefArray(223) = 3 ; ß 
EndProcedure

Procedure.s MakeLowerCase(StrPtr.l)
Protected OutCount.l, InCount.l
Repeat
  Select LcaseRefArray(PeekA(StrPtr + InCount))  
    Case 4 To 255  
      LcArray(OutCount) = LcaseRefArray(PeekA(StrPtr + InCount))
      OutCount + 1  
    Case 1 ; œ, Œ   
      LcArray(OutCount) = 111
      OutCount + 1  
      LcArray(OutCount) = 101
      OutCount + 1
    Case 2 ; æ, Æ
      LcArray(OutCount) = 97
      OutCount + 1  
      LcArray(OutCount) = 101
      OutCount + 1  
    Case 3 ; ß 
      LcArray(OutCount) = 115
      OutCount + 1  
      LcArray(OutCount) = 115
      OutCount + 1
  EndSelect
  InCount + 1  
Until LcaseRefArray(PeekA(StrPtr + InCount)) = 0
ProcedureReturn PeekS(@LcArray(0))
EndProcedure


CreateLowerCaseCharString()
Word = "ÆXAMPLE ßTRING"

If OpenWindow(0, 20, 20, 400, 300, "Lower Case") 
  TextGadget(0, 20, 20, 250, 18, "Original String: " + Word)
  TextGadget(1, 20, 40, 250, 18, "Lower case String: " + MakeLowerCase(@Word))
  TextGadget(2, 20, 60, 250, 18, "")
  FirstTick = GetTickCount_()
  For Inc = 1 To 1000000
    MakeLowerCase(@Word)
    ;LcaseWord = MakeLowerCase(@Word)
  Next 
  TicksTaken = GetTickCount_() - FirstTick
  SetGadgetText(2, "Time Taken: " + Str(TicksTaken))
  Repeat
    EventId = WaitWindowEvent()
  Until EventId = #PB_Event_CloseWindow
  CloseWindow(0)  
EndIf
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: More Sophisticated Lower Case function

Post by wilbert »

CalamityJames wrote:Rather to my surprise this is faster than Little John's where the comparable figures are 750 and 1000
It's not a big surprise.
Like I already mentioned, using a static buffer makes a huge difference.
I updated my post above with my ASM routine and added a second version with a static buffer.
When I compare it with your latest routine (all compiled as x86 / ascii), my ASM version with a dynamic buffer takes 328 on my computer, your latest routine takes 202, my ASM version with a static buffer takes 47.

When you have a test routine that is called very often with a small test string like your current example, even things like the amount of arguments for the procedure have a big impact.
With a larger string and less calls to the pocedure, the results can be very different.
The best thing you can do to test these kind of routines is use a test string that is about the same size as the strings you are going to use in your program.
Windows (x64)
Raspberry Pi OS (Arm64)
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: More Sophisticated Lower Case function

Post by Little John »

Hello wilbert,

the speed of your ASM version with a static buffer is impressive!
For more flexibility, I tried to separate the table data from the actual code, but to no avail.
I think it's possible, and I just made a stupid mistake, no?
Here is my not working code:

Code: Select all

EnableExplicit

Procedure CreateTranslationTable (Array trans.w(1))
   ; * adapt this procedure to your needs *
   ; out: trans(): translation table
   Protected i.i, last.i=ArraySize(trans())
   
   ; initialize
   For i = 1 To last
      trans(i) = i
   Next
   
   ; normal ASCII uppercase characters -> lowercase
   For i = 'A' To 'Z'
      trans(i) = i+32
   Next
   ; special characters
   trans(Asc("É")) = Asc("e")
   trans(Asc("Æ")) = Asc("a") + Asc("e")*256
   trans(Asc("ß")) = Asc("s") + Asc("s")*256
EndProcedure


Procedure.s MakeLowerCase_S (*String, *table)
   Static Dim buffer.c(16384)
   Protected *buffer = @buffer()
   
   CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
      !mov ecx, [p.p_String]
      !mov edx, [p.p_buffer]
      !lcase_s_translation_loop:
      CompilerIf #PB_Compiler_Unicode
         !movzx eax, word [ecx]
         !add ecx, 2
         !test ah, ah
         !jnz lcase_s_translation_cont1
         !mov ax, [p.p_table + eax * 2]
         !test ah, ah
         !jz lcase_s_translation_cont2
         !shl eax, 8
         !shr ax, 8
         !mov [edx], eax
         !add edx, 4
         !jmp lcase_s_translation_loop
         !lcase_s_translation_cont1:
         !mov [edx], ax
         !add edx, 2
         !jmp lcase_s_translation_loop
         !lcase_s_translation_cont2:
         !mov [edx], ax
         !add edx, 2
      CompilerElse
         !movzx eax, byte [ecx]
         !inc ecx
         !mov ax, [p.p_table + eax * 2]
         !mov [edx], ax
         !neg ah
         !adc edx, 1
      CompilerEndIf
   CompilerElse
      !lea r8, [p.p_table]
      !mov rcx, [p.p_String]
      !mov rdx, [p.p_buffer]
      !lcase_s_translation_loop:
      CompilerIf #PB_Compiler_Unicode
         !movzx eax, word [rcx]
         !add rcx, 2
         !test ah, ah
         !jnz lcase_s_translation_cont1
         !mov ax, [r8 + rax * 2]
         !test ah, ah
         !jz lcase_s_translation_cont2
         !shl eax, 8
         !shr ax, 8
         !mov [rdx], eax
         !add rdx, 4
         !jmp lcase_s_translation_loop
         !lcase_s_translation_cont1:
         !mov [rdx], ax
         !add rdx, 2
         !jmp lcase_s_translation_loop
         !lcase_s_translation_cont2:
         !mov [rdx], ax
         !add rdx, 2
      CompilerElse
         !movzx eax, byte [rcx]
         !inc rcx
         !mov ax, [r8 + rax * 2]
         !mov [rdx], ax
         !neg ah
         !adc rdx, 1
      CompilerEndIf
   CompilerEndIf
   !test al, al
   !jnz lcase_s_translation_loop
   
   ProcedureReturn PeekS(@buffer())
EndProcedure


Define old$, new$, msg$
Define t.i, i.i, n.i=500000
Dim table.w(255)

old$ = "A small ÆXAMPLÉ ßTRING"

CreateTranslationTable(table())

t = ElapsedMilliseconds()
For i = 1 To n
   new$ = MakeLowerCase_S(@old$, @table())
Next
t = ElapsedMilliseconds() - t

msg$ = old$ + #LF$
msg$ + "*" + new$ + "* : " + Str(t) + " ms"
MessageRequester("Info", msg$)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: More Sophisticated Lower Case function

Post by wilbert »

@Little John,
Here's the fixed version of your attempt.
You were using the address of *table instead of the value itself ;)
Just a small mistake. With your adapted version, you can indeed create your own table.
It is more flexible. This way you can have multiple tables like also one to convert to uppercase instead. :)
Be aware that the procedure in its current form always expects an array of 256 items.

Code: Select all

EnableExplicit

Procedure CreateTranslationTable (Array trans.w(1))
  ; * adapt this procedure to your needs *
  ; out: trans(): translation table
  Protected i.i, last.i=ArraySize(trans())
  
  ; initialize
  For i = 1 To last
    trans(i) = i
  Next
  
  ; normal ASCII uppercase characters -> lowercase
  For i = 'A' To 'Z'
    trans(i) = i+32
  Next
  ; special characters
  trans(Asc("É")) = Asc("e")
  trans(Asc("Æ")) = Asc("a") + Asc("e")*256
  trans(Asc("ß")) = Asc("s") + Asc("s")*256
EndProcedure


Procedure.s MakeLowerCase_S (*String, *table)
  Static Dim buffer.c(16384)
  Protected *buffer = @buffer()
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ecx, [p.p_String]
    !mov edx, [p.p_buffer]
    !mov eax, [p.p_table]
    !push ebx
    !mov ebx, eax
    !lcase_s_translation_loop:
    CompilerIf #PB_Compiler_Unicode
      !movzx eax, word [ecx]
      !add ecx, 2
      !test ah, ah
      !jnz lcase_s_translation_cont1
      !mov ax, [ebx + eax * 2]
      !test ah, ah
      !jz lcase_s_translation_cont2
      !shl eax, 8
      !shr ax, 8
      !mov [edx], eax
      !add edx, 4
      !jmp lcase_s_translation_loop
      !lcase_s_translation_cont1:
      !mov [edx], ax
      !add edx, 2
      !jmp lcase_s_translation_loop
      !lcase_s_translation_cont2:
      !mov [edx], ax
      !add edx, 2
    CompilerElse
      !movzx eax, byte [ecx]
      !inc ecx
      !mov ax, [ebx + eax * 2]
      !mov [edx], ax
      !neg ah
      !adc edx, 1
    CompilerEndIf
    !test al, al
    !jnz lcase_s_translation_loop
    !pop ebx
  CompilerElse
    !mov r8, [p.p_table]
    !mov rcx, [p.p_String]
    !mov rdx, [p.p_buffer]
    !lcase_s_translation_loop:
    CompilerIf #PB_Compiler_Unicode
      !movzx eax, word [rcx]
      !add rcx, 2
      !test ah, ah
      !jnz lcase_s_translation_cont1
      !mov ax, [r8 + rax * 2]
      !test ah, ah
      !jz lcase_s_translation_cont2
      !shl eax, 8
      !shr ax, 8
      !mov [rdx], eax
      !add rdx, 4
      !jmp lcase_s_translation_loop
      !lcase_s_translation_cont1:
      !mov [rdx], ax
      !add rdx, 2
      !jmp lcase_s_translation_loop
      !lcase_s_translation_cont2:
      !mov [rdx], ax
      !add rdx, 2
    CompilerElse
      !movzx eax, byte [rcx]
      !inc rcx
      !mov ax, [r8 + rax * 2]
      !mov [rdx], ax
      !neg ah
      !adc rdx, 1
    CompilerEndIf
    !test al, al
    !jnz lcase_s_translation_loop
  CompilerEndIf
  
  ProcedureReturn PeekS(@buffer())
EndProcedure


Define old$, new$, msg$
Define t.i, i.i, n.i=500000
Dim table.w(255)

old$ = "A small ÆXAMPLÉ ßTRING"

CreateTranslationTable(table())

t = ElapsedMilliseconds()
For i = 1 To n
  new$ = MakeLowerCase_S(@old$, @table())
Next
t = ElapsedMilliseconds() - t

msg$ = old$ + #LF$
msg$ + "*" + new$ + "* : " + Str(t) + " ms"
MessageRequester("Info", msg$)
Last edited by wilbert on Fri May 10, 2013 3:06 pm, edited 1 time in total.
Windows (x64)
Raspberry Pi OS (Arm64)
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: More Sophisticated Lower Case function

Post by Little John »

Great!
Thank you very much, wilbert!
CalamityJames
User
User
Posts: 81
Joined: Sat Mar 13, 2010 4:50 pm

Re: More Sophisticated Lower Case function

Post by CalamityJames »

Well I've got what I wanted! So thanks especially to Wilbert, whose static buffer routine is perfect for my needs, since I am searching for a string in a large series phrases of less than 100 characters. His routine is 20 times faster than the one I posted originally. Together with Little John's adaptation it seems to me that there is now perfect Case altering function choice. I do hope other people will find the functions useful.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: More Sophisticated Lower Case function

Post by wilbert »

Here's also a full unicode version in case someone needs it.
It's a little bit slower in ascii mode but a bit faster in unicode mode and is capable of substituting any unicode character with one or two other unicode characters.

Code: Select all

EnableExplicit

Procedure CreateTranslationTable (Array trans.l(1))
  ; * adapt this procedure to your needs *
  ; out: trans(): translation table
  Protected i.i, last.i=ArraySize(trans())
  
  ; initialize
  For i = 1 To last
    trans(i) = i
  Next
  
  ; normal ASCII uppercase characters -> lowercase
  For i = 'A' To 'Z'
    trans(i) = i+32
  Next
  ; special characters
  trans(Asc("É")) = Asc("e")
  trans(Asc("Æ")) = Asc("a") + Asc("e") << 16
  trans(Asc("ß")) = Asc("s") + Asc("s") << 16
EndProcedure


Procedure.s TranslateCharacters (*String, *table)
  Static Dim buffer.c(16384)
  Protected *buffer = @buffer()
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov ecx, [p.p_String]
    !mov edx, [p.p_buffer]
    !mov eax, [p.p_table]
    !push ebx
    !mov ebx, eax
    !translate_characters_loop:
    CompilerIf #PB_Compiler_Unicode
      !movzx eax, word [ecx]    
      !add ecx, 2
      !mov eax, [ebx + eax * 4]
      !test eax, 0xffff0000
      !jnz translate_characters_cont
      !mov [edx], ax
      !add edx, 2
      !test eax, eax
      !jnz translate_characters_loop
      !jmp translate_characters_exit
      !translate_characters_cont:
      !mov [edx], eax
      !add edx, 4
      !jmp translate_characters_loop
    CompilerElse
      !movzx eax, byte [ecx]
      !add ecx, 1
      !mov eax, [ebx + eax * 4]
      !test eax, 0xffff0000
      !jnz translate_characters_cont
      !mov [edx], al
      !add edx, 1
      !test eax, eax
      !jnz translate_characters_loop
      !jmp translate_characters_exit
      !translate_characters_cont:
      !shl ax, 8
      !shr eax, 8
      !mov [edx], ax
      !add edx, 2
      !jmp translate_characters_loop
    CompilerEndIf
    !translate_characters_exit:
    !pop ebx
  CompilerElse
    !mov r8, [p.p_table]
    !mov rcx, [p.p_String]
    !mov rdx, [p.p_buffer]
    !translate_characters_loop:
    CompilerIf #PB_Compiler_Unicode
      !movzx eax, word [rcx]    
      !add rcx, 2
      !mov eax, [r8 + rax * 4]
      !test eax, 0xffff0000
      !jnz translate_characters_cont
      !mov [rdx], ax
      !add rdx, 2
      !test eax, eax
      !jnz translate_characters_loop
      !jmp translate_characters_exit
      !translate_characters_cont:
      !mov [rdx], eax
      !add rdx, 4
      !jmp translate_characters_loop
    CompilerElse
      !movzx eax, byte [rcx]
      !add rcx, 1
      !mov eax, [r8 + rax * 4]
      !test eax, 0xffff0000
      !jnz translate_characters_cont
      !mov [rdx], al
      !add rdx, 1
      !test eax, eax
      !jnz translate_characters_loop
      !jmp translate_characters_exit
      !translate_characters_cont:
      !shl ax, 8
      !shr eax, 8
      !mov [rdx], ax
      !add rdx, 2
      !jmp translate_characters_loop
    CompilerEndIf
    !translate_characters_exit:
  CompilerEndIf
  ProcedureReturn PeekS(@buffer())
EndProcedure


Define old$, new$, msg$
Define t.i, i.i, n.i=500000
Dim table.l(65535)

old$ = "A small ÆXAMPLÉ ßTRING"

CreateTranslationTable(table())

t = ElapsedMilliseconds()
For i = 1 To n
  new$ = TranslateCharacters(@old$, @table())
Next
t = ElapsedMilliseconds() - t

msg$ = old$ + #LF$
msg$ + "*" + new$ + "* : " + Str(t) + " ms"
MessageRequester("Info", msg$)
Last edited by wilbert on Sat May 11, 2013 7:45 am, edited 2 times in total.
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply