Joining string lists into strings
Posted: Mon Jun 15, 2020 9:11 pm
This board has a few solutions for string-building already, but I thought I'd add my own variation.
I'm coming to PureBasic after spending several years using Python. One thing I miss from Python is the join() method for joining a list of strings using a delimiter in between each pair of strings, like this:
My Join() procedure is a PureBasic version of Python's join(). You'd create a NewList, then AddElement() and assign each of your small strings to the current element of the list. When your list is complete, you'd pass the list and an optional delimiter string to my Join() procedure.
The code includes a quick unit test and a performance test. This uses MessageRequester() and not Debug() to display the results, so you can turn the debugger off before running the tests.
I suspect that my procedure is a little slower than it would be if it took an array of strings instead of a linked list, but that's OK; if you add strings to a list, you don't have to worry about having to ReDim the list whenever it gets full.
(Sample run on my machine: Join() takes 0.019 seconds to join 32,000 strings; the + operator takes 24.426 seconds to do the same thing.)
I'm coming to PureBasic after spending several years using Python. One thing I miss from Python is the join() method for joining a list of strings using a delimiter in between each pair of strings, like this:
Code: Select all
strings = ["ab", "cd", "ef"]
text = "/".join(strings)
# Result should be "ab/cd/ef"
The code includes a quick unit test and a performance test. This uses MessageRequester() and not Debug() to display the results, so you can turn the debugger off before running the tests.
I suspect that my procedure is a little slower than it would be if it took an array of strings instead of a linked list, but that's OK; if you add strings to a list, you don't have to worry about having to ReDim the list whenever it gets full.

(Sample run on my machine: Join() takes 0.019 seconds to join 32,000 strings; the + operator takes 24.426 seconds to do the same thing.)
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; The "Strings" module
;; --------------------
;;
;; This module contains procedures useful for working with strings.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
DeclareModule Strings
#INCLUDE_TESTS = 1
Declare.s Join(List stringlist.s(), delimiter.s = "")
CompilerIf #INCLUDE_TESTS
Declare UnitTest_Join()
Declare PerformanceTest_Join()
CompilerEndIf
EndDeclareModule
Module Strings
EnableExplicit
;{ ### `Strings::Join(List stringlist.s(), delimiter.s = ""` ###############
;;
;; Return a string consisting of each of the strings in `stringlist()`,
;; separated by the `delimiter`. Example: If `stringlist()` contains "one",
;; "two", and "three", and the `delimiter` is "::", then return the string
;; "one::two::three".
;;
;; The default delimiter is an empty string, so this procedure is useful if
;; you just want to join strings: If `stringlist()` contains "111", "222",
;; and "333", `Strings::Join(stringlist())` returns "111222333".
;}
Procedure.s Join(List stringlist.s(), delimiter.s = "")
Define list_size, delimiter_size, null_size, size
Define *buffer, *position, index, result$
; If the list is empty, return a blank string. This makes it
; safe for the rest of the procedure to assume there's always
; something in the string list.
list_size = ListSize(stringlist())
If Not list_size
ProcedureReturn ""
EndIf
; First pass: Add up string lengths so we know how much
; memory to allocate for the buffer.
delimiter_size = StringByteLength(delimiter)
null_size = StringByteLength(Chr(0))
size = 0
ForEach stringlist()
size + StringByteLength(stringlist()) + delimiter_size
Next
size - delimiter_size ; No delimiter after last string in list!
size + null_size ; Make room for the final null character.
*buffer = AllocateMemory(size)
*position = *buffer
; Second pass: Copy the strings into the buffer.
index = 0
ResetList(stringlist())
NextElement(stringlist())
CopyMemoryString(stringlist(), @*position)
While NextElement(stringlist())
CopyMemoryString(delimiter)
CopyMemoryString(stringlist())
Wend
result$ = PeekS(*buffer)
FreeMemory(*buffer)
ProcedureReturn result$
EndProcedure
CompilerIf #INCLUDE_TESTS
Procedure.s Assert(expected$, actual$)
Define result$
If expected$ <> actual$
result$ = ("FAIL: Expected " + Chr(34) + expected$ + Chr(34) +
", got " + Chr(34) + actual$ + Chr(34) + #CRLF$)
EndIf
ProcedureReturn result$
EndProcedure
Procedure UnitTest_Join()
Define verdict$, result$, delimiter$
NewList strings.s()
result$ = Join(strings())
verdict$ + Assert("", result$)
delimiter$ = "//"
result$ = Join(strings(), delimiter$)
verdict$ + Assert("", result$)
AddElement(strings()) : strings() = "blue"
result$ = Join(strings())
verdict$ + Assert("blue", result$)
result$ = Join(strings(), delimiter$)
verdict$ + Assert("blue", result$)
AddElement(strings()) : strings() = "fox"
result$ = Join(strings())
verdict$ + Assert("bluefox", result$)
result$ = Join(strings(), delimiter$)
verdict$ + Assert("blue//fox", result$)
If Len(verdict$)
verdict$ = "UNIT TEST FAILED:" + #CRLF$ + #CRLF$ + verdict$
Else
verdict$ = "UNIT TEST PASSED."
EndIf
MessageRequester("Unit test result", verdict$)
EndProcedure
Procedure PerformanceTest_Join()
Define count, element$, delimiter$, size, i, sec1.d, sec2.d
Define result$, message$, flags, answer
NewList strings.s()
count = 32000
element$ = "The quick brown fox jumps over the lazy dog."
delimiter$ = ""
size = (StringByteLength(element$) * count) +
(StringByteLength(delimiter$) * (count - 1)) +
(StringByteLength(Chr(0)))
For i = 1 To count
AddElement(strings()) : strings() = element$
Next
message$ = "Starting the performance test comparing Join() " +
"against the normal string operator (+). This " +
"could take a while. Proceed?"
flags = #PB_MessageRequester_YesNo
answer = MessageRequester("Performance test", message$, flags)
If answer = #PB_MessageRequester_Yes
sec1 = ElapsedMilliseconds()
result$ = Join(strings())
sec1 = (ElapsedMilliseconds() - sec1) / 1000
sec2 = ElapsedMilliseconds()
result$ = ""
i = 0
ResetList(strings())
While NextElement(strings())
result$ + strings()
i + 1
If i % 1000 = 0
Debug i
EndIf
Wend
sec2 = (ElapsedMilliseconds() - sec2) / 1000
message$ = "Join() took " +
FormatNumber(sec1, 3) + " seconds to join " +
FormatNumber(count, 0) + " strings into a string " +
FormatNumber(size, 0) + " bytes long." +
#CRLF$ + #CRLF$ +
"The string concatenation operator (+) took " +
FormatNumber(sec2, 3) + " seconds to join " +
FormatNumber(count, 0) + " strings into a string " +
FormatNumber(size, 0) + " bytes long."
MessageRequester("Performance test result", message$)
EndIf
EndProcedure
CompilerEndIf
EndModule
CompilerIf Strings::#INCLUDE_TESTS
; Turn off "Debugger|Use Debugger" in the IDE menu
; to run the tests at full speed.
Strings::UnitTest_Join()
Strings::PerformanceTest_Join()
CompilerEndIf