Page 1 of 1

Joining string lists into strings

Posted: Mon Jun 15, 2020 9:11 pm
by Karig1965
This board has a few solutions for string-building already, but I thought I'd add my own variation.

I'm coming to PureBasic after spending several years using Python. One thing I miss from Python is the join() method for joining a list of strings using a delimiter in between each pair of strings, like this:

Code: Select all

strings = ["ab", "cd", "ef"]
text = "/".join(strings)
# Result should be "ab/cd/ef"
My Join() procedure is a PureBasic version of Python's join(). You'd create a NewList, then AddElement() and assign each of your small strings to the current element of the list. When your list is complete, you'd pass the list and an optional delimiter string to my Join() procedure.

The code includes a quick unit test and a performance test. This uses MessageRequester() and not Debug() to display the results, so you can turn the debugger off before running the tests.

I suspect that my procedure is a little slower than it would be if it took an array of strings instead of a linked list, but that's OK; if you add strings to a list, you don't have to worry about having to ReDim the list whenever it gets full. :D

(Sample run on my machine: Join() takes 0.019 seconds to join 32,000 strings; the + operator takes 24.426 seconds to do the same thing.)

Code: Select all

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; The "Strings" module
;; --------------------
;;
;; This module contains procedures useful for working with strings.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

DeclareModule Strings
	#INCLUDE_TESTS = 1

	Declare.s Join(List stringlist.s(), delimiter.s = "")
	
	CompilerIf #INCLUDE_TESTS
		Declare UnitTest_Join()
		Declare PerformanceTest_Join()
	CompilerEndIf
EndDeclareModule

Module Strings
	EnableExplicit
	
	;{ ### `Strings::Join(List stringlist.s(), delimiter.s = ""` ###############
	;;
	;; Return a string consisting of each of the strings in `stringlist()`,
	;; separated by the `delimiter`. Example: If `stringlist()` contains "one",
	;; "two", and "three", and the `delimiter` is "::", then return the string
	;; "one::two::three".
	;;
	;; The default delimiter is an empty string, so this procedure is useful if
	;; you just want to join strings: If `stringlist()` contains "111", "222",
	;; and "333", `Strings::Join(stringlist())` returns "111222333".
	;}
	Procedure.s Join(List stringlist.s(), delimiter.s = "")
		Define list_size, delimiter_size, null_size, size
		Define *buffer, *position, index, result$
		
		; If the list is empty, return a blank string. This makes it
		; safe for the rest of the procedure to assume there's always
		; something in the string list.
		
		list_size = ListSize(stringlist())
		If Not list_size
			ProcedureReturn ""
		EndIf
		
		; First pass: Add up string lengths so we know how much
		; memory to allocate for the buffer.
		
		delimiter_size = StringByteLength(delimiter)
		null_size      = StringByteLength(Chr(0))
		
		size = 0
		ForEach stringlist()
			size + StringByteLength(stringlist()) + delimiter_size
		Next
		size - delimiter_size ; No delimiter after last string in list!
		size + null_size      ; Make room for the final null character.
		
		*buffer = AllocateMemory(size)
		*position = *buffer
		
		; Second pass: Copy the strings into the buffer.
		
		index = 0
		ResetList(stringlist())
		NextElement(stringlist())
		CopyMemoryString(stringlist(), @*position)
		
		While NextElement(stringlist())
			CopyMemoryString(delimiter)
			CopyMemoryString(stringlist())
		Wend
		
		result$ = PeekS(*buffer)
		FreeMemory(*buffer)
		ProcedureReturn result$
	EndProcedure
	
	CompilerIf #INCLUDE_TESTS
		Procedure.s Assert(expected$, actual$)
			Define result$
			If expected$ <> actual$
				result$ = ("FAIL: Expected " + Chr(34) + expected$ + Chr(34) +
				           ", got " + Chr(34) + actual$ + Chr(34) + #CRLF$)
			EndIf
			ProcedureReturn result$
		EndProcedure
		
		Procedure UnitTest_Join()
			Define verdict$, result$, delimiter$
			NewList strings.s()
			
			result$ = Join(strings())
			verdict$ + Assert("", result$)
			
			delimiter$ = "//"
			result$ = Join(strings(), delimiter$)
			verdict$ + Assert("", result$)
			
			AddElement(strings()) : strings() = "blue"
			result$ = Join(strings())
			verdict$ + Assert("blue", result$)
			
			result$ = Join(strings(), delimiter$)
			verdict$ + Assert("blue", result$)
			
			AddElement(strings()) : strings() = "fox"
			result$ = Join(strings())
			verdict$ + Assert("bluefox", result$)
			
			result$ = Join(strings(), delimiter$)
			verdict$ + Assert("blue//fox", result$)
			
			If Len(verdict$)
				verdict$ = "UNIT TEST FAILED:" + #CRLF$ + #CRLF$ + verdict$
			Else
				verdict$ = "UNIT TEST PASSED."
			EndIf
			MessageRequester("Unit test result", verdict$)
		EndProcedure
		
		Procedure PerformanceTest_Join()
			Define count, element$, delimiter$, size, i, sec1.d, sec2.d
			Define result$, message$, flags, answer
			NewList strings.s()
			
			count = 32000
			element$ = "The quick brown fox jumps over the lazy dog."
			delimiter$ = ""
			size = (StringByteLength(element$) * count) +
			       (StringByteLength(delimiter$) * (count - 1)) +
			       (StringByteLength(Chr(0)))
			For i = 1 To count
				AddElement(strings()) : strings() = element$
			Next
			
			message$ = "Starting the performance test comparing Join() " +
			           "against the normal string operator (+). This " +
			           "could take a while. Proceed?"
			flags = #PB_MessageRequester_YesNo
			answer = MessageRequester("Performance test", message$, flags)
			If answer = #PB_MessageRequester_Yes
				sec1 = ElapsedMilliseconds()
				result$ = Join(strings())
				sec1 = (ElapsedMilliseconds() - sec1) / 1000
				
				sec2 = ElapsedMilliseconds()
				result$ = ""
				i = 0
				ResetList(strings())
				While NextElement(strings())
					result$ + strings()
					i + 1
					If i % 1000 = 0
						Debug i
					EndIf
				Wend
				sec2 = (ElapsedMilliseconds() - sec2) / 1000
				
				message$ = "Join() took " +
				           FormatNumber(sec1, 3) + " seconds to join " +
				           FormatNumber(count, 0) + " strings into a string " +
				           FormatNumber(size, 0) + " bytes long." +
				           #CRLF$ + #CRLF$ +
				           "The string concatenation operator (+) took " +
				           FormatNumber(sec2, 3) + " seconds to join " +
				           FormatNumber(count, 0) + " strings into a string " +
				           FormatNumber(size, 0) + " bytes long."
				MessageRequester("Performance test result", message$)
			EndIf
		EndProcedure
	CompilerEndIf
EndModule

CompilerIf Strings::#INCLUDE_TESTS
	; Turn off "Debugger|Use Debugger" in the IDE menu
	; to run the tests at full speed.
	Strings::UnitTest_Join()
	Strings::PerformanceTest_Join()
CompilerEndIf

Re: Joining string lists into strings

Posted: Mon Jun 15, 2020 9:35 pm
by skywalk

Re: Joining string lists into strings

Posted: Mon Jun 15, 2020 9:56 pm
by Karig1965
Yes, that looks like it works in the same way, only it expects the strings to be in an array. I thought it might be a little more convenient to use a list for joining strings. Unless I'm missing something (which is possible, since I'm a PB n00b...).