How to speed up array creation

Just starting out? Need help? Post your questions and find answers here.
simkot
User
User
Posts: 31
Joined: Sat Oct 26, 2024 8:25 am

How to speed up array creation

Post by simkot »

There is a script for getting the value of a string from an array. It is designed to work with large files weighing from 300 mb, but creating a script takes a very long time. 10-20 times longer than in Autoit. Is it possible to speed it up?

Code: Select all

If ReadFile(0, "test.txt")
  bom = ReadStringFormat(0)
  Size.q = Lof(0)
  If Size > 0
    *p = AllocateMemory(Size + 2)
    If *p
      ReadData(0, *p, Size)
      FreeMemory(*p)
    EndIf
  EndIf
  CloseFile(0)
EndIf

lineCount = CountString(content$, #LF$) + 1
count_time = ElapsedMilliseconds()

Dim lines.s(lineCount)
*ptr.Character = @content$
line$ = ""
index = 0
While *ptr\c
  If *ptr\c = #LF
    lines(index) = line$
    index + 1
    line$ = ""
  ElseIf *ptr\c <> #CR
    line$ + Chr(*ptr\c)
  EndIf
  *ptr + SizeOf(Character)
Wend

If line$ <> ""
  lines(index) = line$
EndIf

If index >= 30000
  Debug("line = " + lines(3))
EndIf
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: How to speed up array creation

Post by Demivec »

Your code isn't quite complete or runnable.

Here's one approach (untested)::

Code: Select all

ReadFile(0, "test.txt")
  bom = ReadStringFormat(0) :code as written assumes the string format is #PB_Unicode
  Size.q = Lof(0) - Loc(0)
  If Size > 0
    *p = AllocateMemory(Size + 2)
    If *p
      ReadData(0, *p, Size)
    EndIf
  EndIf
  CloseFile(0)
EndIf

If *p > 0
  count_time = ElapsedMilliseconds()
  
  NewList lines.s()
  *ptr.Character = *p
  *lineStartPtr = *ptr
  While *ptr\c
    If *ptr\c = #LF
      AddElement(lines())
      lines() = PeekS(*lineStartPtr, *ptr - *lineStartPtr  + 1) 
      *lineStartPtr = *ptr + 1
    EndIf
    *ptr + SizeOf(Character)
  Wend
  
  If  *ptr > *lineStartPtr 
    AddElement(lines())
    lines() = PeekS(*lineStartPtr, *ptr - *lineStartPtr  + 1) 
  EndIf
  
  lineCount = ListSize(lines())
  count_time = ElapsedMilliseconds() - coun_time
  output$ = "Lines: " + lineCount + ", Time: " + countt_time + "ms"
  If lineCount >= 30000
    SelectElement(lines(), 3)
    output$ + #LF$ + "line = " + lines())
  EndIf
  
  MessageRequester("Read Stats", output$)
EndIf 

If the lines are needed in an array they can be copied from the list to an array when everything is complete.

@Edit: Corrected a few errors present in the original source code
Last edited by Demivec on Sun Mar 23, 2025 9:44 am, edited 4 times in total.
pjay
Enthusiast
Enthusiast
Posts: 251
Joined: Thu Mar 30, 2006 11:14 am

Re: How to speed up array creation

Post by pjay »

As Demivec said, there are a couple of logic errors in your code, so it shouldn't really be running at all.

You're reading the file into memory (*p), then immediately releasing that memory, so it cannot be used further down the line.
The content$ variable is not populated, so the pointer operations on it will cause a crash.

Maybe you'd be better off just keeping it simple & reading your file line by line:

Code: Select all

;/ read text file line-by-line into list, then move to array if needed:
If ReadFile(0, YourFile.s) : bom = ReadStringFormat(0)
  NewList strList.s() : count_time = ElapsedMilliseconds()

  While Not Eof(0) : AddElement(strList()) : strList() = ReadString(0) : Wend : CloseFile(0)

  ;/ transfer list to array (if you have to...)
  linecount = ListSize(strList()) : Dim lines.s(linecount)
  ForEach strList() : lines(ListIndex(strList())) = strList() : Next : FreeList(strList())
  
  count_time = ElapsedMilliseconds() - count_time
  MessageRequester("Info", "Lines read: " + Str(linecount) + #LF$ + "File read in " + Str(count_time) + "ms")
Else
  MessageRequester("Error", "Unable to read file")
EndIf
simkot
User
User
Posts: 31
Joined: Sat Oct 26, 2024 8:25 am

Re: How to speed up array creation

Post by simkot »

Thank you. I will test your answers and let you know. Yes, the first code was not correct. Here is the code I am working on.

Code: Select all

start_time = ElapsedMilliseconds()

If ReadFile(0, "test.txt")
  bom = ReadStringFormat(0)
  Size.q = Lof(0)
  If Size > 0
    *p = AllocateMemory(Size + 2)
    If *p
      ReadData(0, *p, Size)
      content$ = PeekS(*p, -1, bom)  
      FreeMemory(*p)
    EndIf
  EndIf
  CloseFile(0)
Else
  Debug "Error."
  End
EndIf
read_time = ElapsedMilliseconds()

lineCount = CountString(content$, #LF$) + 1
count_time = ElapsedMilliseconds()

Dim lines.s(lineCount)

*ptr.Character = @content$

line$ = ""

index = 0

While *ptr\c
  If *ptr\c = #LF 
    lines(index) = line$
    index + 1
    line$ = ""
  ElseIf *ptr\c <> #CR 
    line$ + Chr(*ptr\c)
  EndIf
  *ptr + SizeOf(Character)
Wend

If line$ <> ""
  lines(index) = line$
EndIf

array_time = ElapsedMilliseconds()

If index >= 3
  Debug("line = " + lines(3))
Else
  Debug("Error.")
EndIf

Debug("Read  " + Str(read_time - start_time))
  Debug("count  " + Str(count_time - read_time))
Debug("Array  " + Str(array_time - read_time))
User avatar
mk-soft
Always Here
Always Here
Posts: 6202
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: How to speed up array creation

Post by mk-soft »

PureBasic has not always been the fastest with large strings, but works great.
So always use Text files ReadString and LinkedList. Thus, there is no problem with the text file format.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: How to speed up array creation

Post by NicTheQuick »

I would do it like this. Just double the size of the array until it is large enough to hold all the lines. No need for other data structures like LinkedLists or stuff like that.

Code: Select all

Procedure FileToArray(filePath.s, Array arr.s(1))
	Protected arraySize.i = ArraySize(arr()) + 1
	Protected hFile.i, index.i = 0, capacity.i = 1
	
	hFile = ReadFile(#PB_Any, filePath)
	If Not hFile
		ProcedureReturn -1
	EndIf
	
	While Not Eof(hFile)
		If index = capacity
			capacity * 2
		EndIf
		If arraySize < capacity
			arraySize = capacity
			ReDim arr(arraySize - 1)
		EndIf
		arr(index) = ReadString(hFile)
		index + 1
	Wend
	
	CloseFile(hFile)
	If index > 0
		ReDim arr(index - 1)
	Else
		ReDim arr(0)
	EndIf
	
	ProcedureReturn index
EndProcedure

Dim arr.s(0)

start_time = ElapsedMilliseconds()

Define i.i, lines.i = FileToArray("/home/nicolas/Text.txt", arr())

read_time = ElapsedMilliseconds()

Debug "Read time: " + Str(read_time - start_time) + " ms"

For i = 0 To lines - 1
	Debug arr(i)
Next
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
mk-soft
Always Here
Always Here
Posts: 6202
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: How to speed up array creation

Post by mk-soft »

@NicTheQuick ;)

With array I enlarge it by, for example, by 100 elements and at the end to the right size.

This is faster than always calling ReDim.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: How to speed up array creation

Post by NicTheQuick »

mk-soft wrote: Sun Mar 23, 2025 12:30 pm @NicTheQuick ;)

With array I enlarge it by, for example, by 100 elements and at the end to the right size.

This is faster than always calling ReDim.
I don't quite understand what you mean. How do you enlarge an array without ReDim?

My point is: This technique is also used in the `std::vector` class in C/C++ and for ArrayList in Java. The amortized time complexity with that is still O(1).

Here's the conclusion from ChatGPT
ChatGPT wrote:The doubling strategy is the most efficient way to handle dynamic arrays with unknown length, as it achieves O(1) amortized insertion time while keeping memory usage reasonable. Most modern programming languages and libraries use this technique in their built-in dynamic array implementations because it provides the best balance between performance and memory efficiency.
And here are some alternative strategies, also from ChatGPT
ChatGPT wrote:
  • Doubling (*2) → Most common, minimizes resizes while keeping O(1) amortized insertions.
  • 1.5x Growth (*1.5) → Saves memory but increases the number of resizes slightly. Still O(1) amortized.
  • Fixed-size increments (+10) → Inefficient; results in O(n) per insertion in the worst case.
Disclaimer: I mostly used ChatGPT for translating into English.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
AZJIO
Addict
Addict
Posts: 2141
Joined: Sun May 14, 2017 1:48 am

Re: How to speed up array creation

Post by AZJIO »

Code: Select all

EnableExplicit

Define i, ArrSz

Define NewList MyList()
For i = 0 To 1000
	AddElement(MyList())
Next


ArrSz = 100
Define Dim Arr(ArrSz)

i = 0
ForEach MyList()
	i + 1
	If i > ArrSz
		ArrSz * 2
		ReDim Arr(ArrSz)
	EndIf
	Arr(i) = i
Next
ReDim Arr(i)


For i = 0 To 1000
	Debug Arr(i)
Next
It's working slowly.

Code: Select all

line$ + Chr(*ptr\c)
you need to memorize the position after LF and read from that position to the next LF and then memorize the position again and so on to the end.
simkot
User
User
Posts: 31
Joined: Sat Oct 26, 2024 8:25 am

Re: How to speed up array creation

Post by simkot »

pjay wrote: Sun Mar 23, 2025 9:22 am Maybe you'd be better off just keeping it simple & reading your file line by line:
Thank you! Your code works great!
Post Reply