Page 1 of 1

Slow string merge in a loop

Posted: Tue Feb 25, 2025 2:10 am
by AZJIO
There is a well-known problem of combining strings in a loop. I had to wait 11 minutes for 100,000 lines. When using a memory write with CopyMemoryString(), this is almost instantaneous. The problem is memory over-allocation. I've read that memory is allocated a bit more than necessary, so adding 2 characters doesn't over-allocate memory.
Is there any way to pre-allocate memory for a variable so that over-allocation does not occur?

current method

Code: Select all

ForEach StrList()
	Len + Len(StrList())
Next

*Result\s = Space(Len)
*Point = @*Result\s
ForEach StrList()
	CopyMemoryString(StrList(), @*Point)
Next
new way

Code: Select all

ForEach StrList()
	Len + Len(StrList())
Next
Option(#String, Result$, Len) ; forcibly set the length of the variable and prevent the variable length from decreasing
ForEach StrList()
	Result$ + StrList()
Next
Option(#String, Result$, 0) ; reset the forced variable length

Re: Slow string merge in a loop

Posted: Tue Feb 25, 2025 2:56 am
by idle
yes this case where the compiler could be a bit smarter and push the strings onto stack
when you append a string in line like this str$ = "a" + "b" + "c" it's actually fast
but when you do it in a loop if becomes
str = str + "a"
str= str + "b"
str = str + "c"

see here where I show how it could be fixed
https://www.purebasic.fr/english/viewto ... 16#p595816

cbackend

Code: Select all

Global s1.s  
Global s2.s   

s1 = "hello" 
s2 = "world" 

st = ElapsedMilliseconds() 
For a =0 To 10000 
   s1 + s2 
Next   
et = ElapsedMilliseconds() 

st1 = ElapsedMilliseconds()
!SYS_PushStringBasePosition();
For a = 0 To 10000 
  !SYS_CopyString(g_s2);
Next 
!SYS_AllocateString4(&g_s1,SYS_PopStringBasePosition());
et1 = ElapsedMilliseconds() 

out.s = Str(et-st) + " " + Str(et1-st1) 
MessageRequester("test",out) 



Re: Slow string merge in a loop

Posted: Tue Feb 25, 2025 6:08 am
by RASHAD
Maybe

Code: Select all

Dim mText.s(count)

ForEach StrList()
	mText(i) = StrList()
	i+1
Next

For t = 0 To count
   text.s = text.s+mText(t)
Next

ReDim mtext.s(0)

Re: Slow string merge in a loop

Posted: Tue Feb 25, 2025 7:33 am
by AZJIO
Example for the test

Code: Select all

EnableExplicit

Define Result$
Define *m
Define i, StartTime
Define NewList ListStr.s()

#StrSize = 70 ; string length
#Count = 5000 ; number of lines
*m = AllocateMemory(#StrSize * 2 + 4)
If Not *m
	End
EndIf

RandomSeed(123456789)

Procedure Filling(*c.Character, List ListStr.s())
	Protected i, j, *c0
	*c0 = *c
	For i = 1 To #Count
		*c = *c0
		For j = 1 To #StrSize
			*c\c = Random(122, 65)
			*c + 2
		Next
		AddElement(ListStr())
		ListStr() = PeekS(*c0)
	Next
EndProcedure

Filling(*m, ListStr())

FreeMemory(*m)

; Output 5 strings showing that the strings exist
ResetList(ListStr())
For i = 1 To 5
	NextElement(ListStr())
	Debug ListStr()
Next

DisableDebugger
StartTime = ElapsedMilliseconds()
ForEach ListStr()
	Result$ + ListStr()
Next
StartTime = (ElapsedMilliseconds() - StartTime)
EnableDebugger
Debug FormatNumber(StartTime / 1000, 3, ".", "") ; seconds

Re: Slow string merge in a loop

Posted: Tue Feb 25, 2025 10:23 am
by Piero
I remember you could use this trick

Code: Select all

a$ = (a$ = "") + a$ + "b"
in other languages
I wonder if it can be done "directly" in PB in some way…
Edit:
Actually, it was with lists

Code: Select all

myList = (myList=[]) + myList + new_item;
but maybe it can be applied to PB strings in some way…

Re: Slow string merge in a loop

Posted: Tue Feb 25, 2025 10:43 am
by SMaag
Example for the test
I tested:
your example : time 0.367
JoinList : time 0.001 (use CopyMemoryString)

I changed to Console Output to be sure it is not a debugger problem! Same result!
Can you confirm this result?

Code: Select all

EnableExplicit

Define Result$
Define *m
Define i, StartTime
Define NewList ListStr.s()

#StrSize = 70 ; string length
#Count = 5000 ; number of lines
*m = AllocateMemory(#StrSize * 2 + 4)
If Not *m
	End
EndIf

RandomSeed(123456789)

Procedure Filling(*c.Character, List ListStr.s())
	Protected i, j, *c0
	*c0 = *c
	For i = 1 To #Count
		*c = *c0
		For j = 1 To #StrSize
			*c\c = Random(122, 65)
			*c + 2
		Next
		AddElement(ListStr())
		ListStr() = PeekS(*c0)
	Next
EndProcedure

Procedure.s JoinList(List lst.s(), Separator$, *IOutLen.Integer=0)
  ; ============================================================================
  ; NAME: JoinList
  ; DESC: Join all ListElements to a single String
  ; VAR(lst.s()) : The String List
  ; VAR(Separator$) : A separator String
  ; VAR(*IOutLen)   : Pointer to a IntVar for optional return of Stringlenght 
  ; RET.s: the String
  ; ============================================================================
    Protected ret$
    Protected I, L, N, lenSep
    Protected *ptr
    
    ;lenSep = MemoryStringLength(@Separator$)
    lenSep = Len(Separator$)
    
    N = ListSize(lst())
    Debug "ListLength = " + N
    
    If N
      ; ----------------------------------------
      ;  With Separator
      ; ----------------------------------------
      ForEach lst()
        L = L + Len(lst()) 
      Next
      L = L + (N-1) * lenSep
      ret$ = Space(L)
      *ptr = @ret$
            
      If lenSep > 0 
        
        ForEach lst()
          If lst()<>#Null$
            CopyMemoryString(lst(), @*ptr)
          EndIf
          
          I + 1
          If I < N
            CopyMemoryString(Separator$, @*ptr)
          EndIf
        Next
        
      Else          
      ; ----------------------------------------
      ;  Without Separator
      ; ----------------------------------------
        
        ForEach lst()
           If lst()<>#Null$
            CopyMemoryString(lst(), @*ptr)
          EndIf
        Next
    
      EndIf
      
    EndIf
    
    If *IOutLen
      *IOutLen\i = L
    EndIf
    
    ProcedureReturn ret$
  EndProcedure


Filling(*m, ListStr())

FreeMemory(*m)

OpenConsole()

; Output 5 strings showing that the strings exist
ResetList(ListStr())
For i = 1 To 5
	NextElement(ListStr())
	PrintN(ListStr())
Next

DisableDebugger
StartTime = ElapsedMilliseconds()
ForEach ListStr()
  
  Result$ + ListStr()
  ;Result$ = JoinList(ListStr(),"")
Next
StartTime = (ElapsedMilliseconds() - StartTime)
EnableDebugger
; Debug FormatNumber(StartTime / 1000, 3, ".", "") ; seconds
PrintN(FormatNumber(StartTime / 1000, 3, ".", "")) ; seconds
PrintN("")
PrintN("press any key!")
Input()

Re: Slow string merge in a loop

Posted: Tue Feb 25, 2025 2:08 pm
by AZJIO
SMaag
Does the OpenConsole() function change anything? Disabling the debugger does the job. The debugger is only turned off where measurement is required.
It's not about the speed of any features, this topic of 100 has been brought up and many options have been suggested. See my first post where I suggested a simplified way without preparing pointers and structure. I'm using the quick way, but I wish the code looked simpler (set the size and reset the size)

1. I added one of the modules (от mk-soft, link)
2. I even have my own function - ListTostring

Re: Slow string merge in a loop

Posted: Tue Feb 25, 2025 3:39 pm
by SMaag
Tanks for the information!
I read again! And yes it was a missunderstanding from my side!