Page 1 of 1

Some small tricks for sorting the letters of a string.

Posted: Tue Jun 09, 2009 10:38 am
by Demivec
Code updated for 5.20+

I have two variations for sorting the characters of a string. They turn a string like "amazing" into "aagimnz". This kind of operation is frequently made to determine if two strings have the same elements irrespective of order (i.e. anagrams) and is usually performed after converting the string to all upper or lower case.

I am posting these small snippets because I think the methods used in them are interesting and unique. The methods might find some use in other areas.

One variation returns the sorted string, the other sorts the string in place. Both versions work with Ascii or Unicode and sort all characters, not just the letters. Both are faster than individually looping through the string's characters to sort them.

The subroutines take a pointer to a string and copy the string's content into the memory for a newly declared array of characters. The array is then sorted using PureBasic's SortArray(). The string is then read as a whole from the array's memory location and either returned or copied over the original strings contents. One important note is that the array's last element (an extra element) is set aside for the strings Null and is not sorted.

Code: Select all

;Author:Demivec  (Jared Johnson)
;Using: PureBasic 4.31

Procedure.s sortLetters(*word.Character, wordLength) ;returns a string with the letters of a word sorted
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
  
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  ProcedureReturn PeekS(@letters(), wordLength)
EndProcedure

;example
Define Before$,After$

Before$ = "demonstration"
After$ = sortLetters(@Before$,Len(Before$))
Debug Before$ + "  ==>  " + After$

Code: Select all

;Author:Demivec  (Jared Johnson)
;Using: PureBasic v4.31

Procedure sortLettersInPlace(*word.Character, wordLength) ;sorts the letters of a word without creating a new string
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
  
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  CopyMemoryString(@letters(), @*word)
EndProcedure

;example
Define Text$

Text$ = "demonstration"
Debug Text$
sortLettersInPlace(@Text$,Len(Text$))
Debug Text$

Posted: Tue Jun 09, 2009 1:31 pm
by Trond
Code updated for 5.20+

I want to add mine here, if it's ok. It too sorts the string in place. I find it's fast at short strings, but loses to yours on longer strings.
(This is a much improved version of the sort used for the state problem.)

Code: Select all

Structure Charray
  C.c[0]
EndStructure

Procedure QuickSortStringAux(*String.Charray, *Start, *Nd.Character)
  QuickSortStringAuxStart:
  If *Start < *Nd
    Protected Pivot.c = *Nd\c
    Protected *I.Character = *Start
    Protected *J.Character = *Nd
    While *I <> *J
      If *I\C < Pivot
        *I + SizeOf(Character)
      Else
        *J\c = *I\c
        *J - SizeOf(Character)
        *I\c = *J\c
      EndIf
    Wend
    *J\c = Pivot
    QuickSortStringAux(*String, *Start, *J-SizeOf(Character))
    ;QuickSortStringAux(*String, J+1, Nd)
    *Start = *J + SizeOf(Character)
    Goto QuickSortStringAuxStart
  EndIf
EndProcedure

Procedure QuickSortString(*String.Character)
  QuickSortStringAux(*String, *String, *String+(MemoryStringLength(*String)-1)*SizeOf(Character))
EndProcedure

Posted: Tue Jun 09, 2009 2:15 pm
by Demivec
@Trond: No problem. I think it is good thing to collect a variety of working methods together in one place, as long as it's in keeping with the thread topic. :wink:

Posted: Tue Jun 09, 2009 2:30 pm
by AND51
Demivec, wow, how elegant your solution is!
Thank your very much for this! :shock:
I never would have had the idea to lay a string over an array... Respect!

Posted: Wed Jun 10, 2009 8:11 pm
by Xombie
I was kind of curious about this so I tried a different approach and put together a speed test with Demivec's code and mine. Mine is just called SortInPlace() and seems to be faster on average although I think I might lose out in shorter words.

EDIT: Made a change and now I think it's faster on both long and short words.

EDIT2: Even shorter version of my function.

Code: Select all

;
Global HoldClip.s
Global Dim gTest1.s(50000)
Global Dim gTest2.s(50000)
For i.i = 0 To 50000
   r.i = Random(200) + 1
   a.s = Space(r)
   *b.Character = @a
   For j.i = 1 To r
      If Random(1)
         *b\c = 97 + Random(25)
      Else
         *b\c = 65 + Random(25)
      EndIf
      *b + SizeOf(Character)
   Next j
   gTest1(i) = a
   gTest2(i) = a
Next i
HoldClip = gTest1(0)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(1)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(2)
;-
Procedure sortLettersInPlace(*word.Character, wordLength) ;sorts the letters of a word without creating a new string
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
 
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  CopyMemoryString(@letters(), @*word)
EndProcedure
;/
Delay(500)
q1.i = GetTickCount_()
For i.i = 0 To 50000
   sortLettersInPlace(@gTest1(i), Len(gTest1(i)))
Next i
r1.i = GetTickCount_()
;/
;-
Procedure SortInPlace(*word.Character)
   ;
   Protected iLoop.i
   ;
   Protected jLoop.i
   ;
   Protected Dim HoldCharacter.i(126)
   ;
   Protected *Position.Character = *word
   ;
   While *Position\c
      ;
      HoldCharacter(*Position\c) + 1
      ;
      *Position + SizeOf(Character)
      ;
   Wend
   ;
   *Position = *word
   ;
   For iLoop = 0 To 126
      ;
      If HoldCharacter(iLoop)
         ;
         For jLoop = 1 To HoldCharacter(iLoop)
            ;
            *Position\c = iLoop
            ;
            *Position + SizeOf(Character)
            ;
         Next jLoop
         ;
      EndIf
      ;
   Next iLoop
   ;
EndProcedure
;/
Delay(500)
q2.i = GetTickCount_()
For i.i = 0 To 50000
   SortInPlace(@gTest2(i))
Next i
r2.i = GetTickCount_()
;/
HoldClip = HoldClip + Chr(13) + Chr(10) + " .... Test 1 ...."
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(0)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(1)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(2)
HoldClip = HoldClip + Chr(13) + Chr(10) + " .... Test 2 ...."
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest2(0)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest2(1)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest2(2)
;
MessageRequester("Test Results", Str(r1-q1)+" : "+Str(r2-q2))
;
SetClipboardText(HoldClip)
;

Posted: Thu Jun 11, 2009 4:16 am
by Demivec
Xombie wrote:I was kind of curious about this so I tried a different approach and put together a speed test with Demivec's code and mine. Mine is just called SortInPlace() and seems to be faster on average although I think I might lose out in shorter words.
@Xombie: I like your "counting" sort. It actually avoids the sort altogether.

It has one major advantage and one minor limitation. The advantage is it works wonderfully on Ascii (2x + faster :wink:). The limitation is it doesn't work well on Unicode. Your test purposely limited the content of the string to only Ascii upper and lower case letters. It would not be able to handle even characters in the extended code page(s). If Unicode character values are between 0 -> 1000 yours becomes even with mine, or worse.

I have only limited knowledge of the pratical use of Unicode, though I am trying to make my code frequently able to process it. I think Latin, Greek and Cyrillic characters are found as high as the range I mentioned. In the end I am not sure how serious those limitations (or practical) those limitations would be.

Thanks for the posting.