Some small tricks for sorting the letters of a string.

Share your advanced PureBasic knowledge/code with the community.
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Some small tricks for sorting the letters of a string.

Post by Demivec »

Code updated for 5.20+

I have two variations for sorting the characters of a string. They turn a string like "amazing" into "aagimnz". This kind of operation is frequently made to determine if two strings have the same elements irrespective of order (i.e. anagrams) and is usually performed after converting the string to all upper or lower case.

I am posting these small snippets because I think the methods used in them are interesting and unique. The methods might find some use in other areas.

One variation returns the sorted string, the other sorts the string in place. Both versions work with Ascii or Unicode and sort all characters, not just the letters. Both are faster than individually looping through the string's characters to sort them.

The subroutines take a pointer to a string and copy the string's content into the memory for a newly declared array of characters. The array is then sorted using PureBasic's SortArray(). The string is then read as a whole from the array's memory location and either returned or copied over the original strings contents. One important note is that the array's last element (an extra element) is set aside for the strings Null and is not sorted.

Code: Select all

;Author:Demivec  (Jared Johnson)
;Using: PureBasic 4.31

Procedure.s sortLetters(*word.Character, wordLength) ;returns a string with the letters of a word sorted
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
  
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  ProcedureReturn PeekS(@letters(), wordLength)
EndProcedure

;example
Define Before$,After$

Before$ = "demonstration"
After$ = sortLetters(@Before$,Len(Before$))
Debug Before$ + "  ==>  " + After$

Code: Select all

;Author:Demivec  (Jared Johnson)
;Using: PureBasic v4.31

Procedure sortLettersInPlace(*word.Character, wordLength) ;sorts the letters of a word without creating a new string
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
  
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  CopyMemoryString(@letters(), @*word)
EndProcedure

;example
Define Text$

Text$ = "demonstration"
Debug Text$
sortLettersInPlace(@Text$,Len(Text$))
Debug Text$
Last edited by Demivec on Tue Jun 09, 2009 9:29 pm, edited 1 time in total.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Code updated for 5.20+

I want to add mine here, if it's ok. It too sorts the string in place. I find it's fast at short strings, but loses to yours on longer strings.
(This is a much improved version of the sort used for the state problem.)

Code: Select all

Structure Charray
  C.c[0]
EndStructure

Procedure QuickSortStringAux(*String.Charray, *Start, *Nd.Character)
  QuickSortStringAuxStart:
  If *Start < *Nd
    Protected Pivot.c = *Nd\c
    Protected *I.Character = *Start
    Protected *J.Character = *Nd
    While *I <> *J
      If *I\C < Pivot
        *I + SizeOf(Character)
      Else
        *J\c = *I\c
        *J - SizeOf(Character)
        *I\c = *J\c
      EndIf
    Wend
    *J\c = Pivot
    QuickSortStringAux(*String, *Start, *J-SizeOf(Character))
    ;QuickSortStringAux(*String, J+1, Nd)
    *Start = *J + SizeOf(Character)
    Goto QuickSortStringAuxStart
  EndIf
EndProcedure

Procedure QuickSortString(*String.Character)
  QuickSortStringAux(*String, *String, *String+(MemoryStringLength(*String)-1)*SizeOf(Character))
EndProcedure
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

@Trond: No problem. I think it is good thing to collect a variety of working methods together in one place, as long as it's in keeping with the thread topic. :wink:
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

Demivec, wow, how elegant your solution is!
Thank your very much for this! :shock:
I never would have had the idea to lay a string over an array... Respect!
PB 4.30

Code: Select all

onErrorGoto(?Fred)
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

I was kind of curious about this so I tried a different approach and put together a speed test with Demivec's code and mine. Mine is just called SortInPlace() and seems to be faster on average although I think I might lose out in shorter words.

EDIT: Made a change and now I think it's faster on both long and short words.

EDIT2: Even shorter version of my function.

Code: Select all

;
Global HoldClip.s
Global Dim gTest1.s(50000)
Global Dim gTest2.s(50000)
For i.i = 0 To 50000
   r.i = Random(200) + 1
   a.s = Space(r)
   *b.Character = @a
   For j.i = 1 To r
      If Random(1)
         *b\c = 97 + Random(25)
      Else
         *b\c = 65 + Random(25)
      EndIf
      *b + SizeOf(Character)
   Next j
   gTest1(i) = a
   gTest2(i) = a
Next i
HoldClip = gTest1(0)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(1)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(2)
;-
Procedure sortLettersInPlace(*word.Character, wordLength) ;sorts the letters of a word without creating a new string
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
 
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  CopyMemoryString(@letters(), @*word)
EndProcedure
;/
Delay(500)
q1.i = GetTickCount_()
For i.i = 0 To 50000
   sortLettersInPlace(@gTest1(i), Len(gTest1(i)))
Next i
r1.i = GetTickCount_()
;/
;-
Procedure SortInPlace(*word.Character)
   ;
   Protected iLoop.i
   ;
   Protected jLoop.i
   ;
   Protected Dim HoldCharacter.i(126)
   ;
   Protected *Position.Character = *word
   ;
   While *Position\c
      ;
      HoldCharacter(*Position\c) + 1
      ;
      *Position + SizeOf(Character)
      ;
   Wend
   ;
   *Position = *word
   ;
   For iLoop = 0 To 126
      ;
      If HoldCharacter(iLoop)
         ;
         For jLoop = 1 To HoldCharacter(iLoop)
            ;
            *Position\c = iLoop
            ;
            *Position + SizeOf(Character)
            ;
         Next jLoop
         ;
      EndIf
      ;
   Next iLoop
   ;
EndProcedure
;/
Delay(500)
q2.i = GetTickCount_()
For i.i = 0 To 50000
   SortInPlace(@gTest2(i))
Next i
r2.i = GetTickCount_()
;/
HoldClip = HoldClip + Chr(13) + Chr(10) + " .... Test 1 ...."
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(0)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(1)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest1(2)
HoldClip = HoldClip + Chr(13) + Chr(10) + " .... Test 2 ...."
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest2(0)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest2(1)
HoldClip = HoldClip + Chr(13) + Chr(10) + gTest2(2)
;
MessageRequester("Test Results", Str(r1-q1)+" : "+Str(r2-q2))
;
SetClipboardText(HoldClip)
;
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

Xombie wrote:I was kind of curious about this so I tried a different approach and put together a speed test with Demivec's code and mine. Mine is just called SortInPlace() and seems to be faster on average although I think I might lose out in shorter words.
@Xombie: I like your "counting" sort. It actually avoids the sort altogether.

It has one major advantage and one minor limitation. The advantage is it works wonderfully on Ascii (2x + faster :wink:). The limitation is it doesn't work well on Unicode. Your test purposely limited the content of the string to only Ascii upper and lower case letters. It would not be able to handle even characters in the extended code page(s). If Unicode character values are between 0 -> 1000 yours becomes even with mine, or worse.

I have only limited knowledge of the pratical use of Unicode, though I am trying to make my code frequently able to process it. I think Latin, Greek and Cyrillic characters are found as high as the range I mentioned. In the end I am not sure how serious those limitations (or practical) those limitations would be.

Thanks for the posting.
Post Reply