Page 1 of 1
How to speed up array creation
Posted: Sun Mar 23, 2025 7:36 am
by simkot
There is a script for getting the value of a string from an array. It is designed to work with large files weighing from 300 mb, but creating a script takes a very long time. 10-20 times longer than in Autoit. Is it possible to speed it up?
Code: Select all
If ReadFile(0, "test.txt")
bom = ReadStringFormat(0)
Size.q = Lof(0)
If Size > 0
*p = AllocateMemory(Size + 2)
If *p
ReadData(0, *p, Size)
FreeMemory(*p)
EndIf
EndIf
CloseFile(0)
EndIf
lineCount = CountString(content$, #LF$) + 1
count_time = ElapsedMilliseconds()
Dim lines.s(lineCount)
*ptr.Character = @content$
line$ = ""
index = 0
While *ptr\c
If *ptr\c = #LF
lines(index) = line$
index + 1
line$ = ""
ElseIf *ptr\c <> #CR
line$ + Chr(*ptr\c)
EndIf
*ptr + SizeOf(Character)
Wend
If line$ <> ""
lines(index) = line$
EndIf
If index >= 30000
Debug("line = " + lines(3))
EndIf
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 9:15 am
by Demivec
Your code isn't quite complete or runnable.
Here's one approach (untested)::
Code: Select all
ReadFile(0, "test.txt")
bom = ReadStringFormat(0) :code as written assumes the string format is #PB_Unicode
Size.q = Lof(0) - Loc(0)
If Size > 0
*p = AllocateMemory(Size + 2)
If *p
ReadData(0, *p, Size)
EndIf
EndIf
CloseFile(0)
EndIf
If *p > 0
count_time = ElapsedMilliseconds()
NewList lines.s()
*ptr.Character = *p
*lineStartPtr = *ptr
While *ptr\c
If *ptr\c = #LF
AddElement(lines())
lines() = PeekS(*lineStartPtr, *ptr - *lineStartPtr + 1)
*lineStartPtr = *ptr + 1
EndIf
*ptr + SizeOf(Character)
Wend
If *ptr > *lineStartPtr
AddElement(lines())
lines() = PeekS(*lineStartPtr, *ptr - *lineStartPtr + 1)
EndIf
lineCount = ListSize(lines())
count_time = ElapsedMilliseconds() - coun_time
output$ = "Lines: " + lineCount + ", Time: " + countt_time + "ms"
If lineCount >= 30000
SelectElement(lines(), 3)
output$ + #LF$ + "line = " + lines())
EndIf
MessageRequester("Read Stats", output$)
EndIf
If the lines are needed in an array they can be copied from the list to an array when everything is complete.
@Edit: Corrected a few errors present in the original source code
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 9:22 am
by pjay
As Demivec said, there are a couple of logic errors in your code, so it shouldn't really be running at all.
You're reading the file into memory (*p), then immediately releasing that memory, so it cannot be used further down the line.
The content$ variable is not populated, so the pointer operations on it will cause a crash.
Maybe you'd be better off just keeping it simple & reading your file line by line:
Code: Select all
;/ read text file line-by-line into list, then move to array if needed:
If ReadFile(0, YourFile.s) : bom = ReadStringFormat(0)
NewList strList.s() : count_time = ElapsedMilliseconds()
While Not Eof(0) : AddElement(strList()) : strList() = ReadString(0) : Wend : CloseFile(0)
;/ transfer list to array (if you have to...)
linecount = ListSize(strList()) : Dim lines.s(linecount)
ForEach strList() : lines(ListIndex(strList())) = strList() : Next : FreeList(strList())
count_time = ElapsedMilliseconds() - count_time
MessageRequester("Info", "Lines read: " + Str(linecount) + #LF$ + "File read in " + Str(count_time) + "ms")
Else
MessageRequester("Error", "Unable to read file")
EndIf
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 11:37 am
by simkot
Thank you. I will test your answers and let you know. Yes, the first code was not correct. Here is the code I am working on.
Code: Select all
start_time = ElapsedMilliseconds()
If ReadFile(0, "test.txt")
bom = ReadStringFormat(0)
Size.q = Lof(0)
If Size > 0
*p = AllocateMemory(Size + 2)
If *p
ReadData(0, *p, Size)
content$ = PeekS(*p, -1, bom)
FreeMemory(*p)
EndIf
EndIf
CloseFile(0)
Else
Debug "Error."
End
EndIf
read_time = ElapsedMilliseconds()
lineCount = CountString(content$, #LF$) + 1
count_time = ElapsedMilliseconds()
Dim lines.s(lineCount)
*ptr.Character = @content$
line$ = ""
index = 0
While *ptr\c
If *ptr\c = #LF
lines(index) = line$
index + 1
line$ = ""
ElseIf *ptr\c <> #CR
line$ + Chr(*ptr\c)
EndIf
*ptr + SizeOf(Character)
Wend
If line$ <> ""
lines(index) = line$
EndIf
array_time = ElapsedMilliseconds()
If index >= 3
Debug("line = " + lines(3))
Else
Debug("Error.")
EndIf
Debug("Read " + Str(read_time - start_time))
Debug("count " + Str(count_time - read_time))
Debug("Array " + Str(array_time - read_time))
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 12:21 pm
by mk-soft
PureBasic has not always been the fastest with large strings, but works great.
So always use Text files ReadString and LinkedList. Thus, there is no problem with the text file format.
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 12:22 pm
by NicTheQuick
I would do it like this. Just double the size of the array until it is large enough to hold all the lines. No need for other data structures like LinkedLists or stuff like that.
Code: Select all
Procedure FileToArray(filePath.s, Array arr.s(1))
Protected arraySize.i = ArraySize(arr()) + 1
Protected hFile.i, index.i = 0, capacity.i = 1
hFile = ReadFile(#PB_Any, filePath)
If Not hFile
ProcedureReturn -1
EndIf
While Not Eof(hFile)
If index = capacity
capacity * 2
EndIf
If arraySize < capacity
arraySize = capacity
ReDim arr(arraySize - 1)
EndIf
arr(index) = ReadString(hFile)
index + 1
Wend
CloseFile(hFile)
If index > 0
ReDim arr(index - 1)
Else
ReDim arr(0)
EndIf
ProcedureReturn index
EndProcedure
Dim arr.s(0)
start_time = ElapsedMilliseconds()
Define i.i, lines.i = FileToArray("/home/nicolas/Text.txt", arr())
read_time = ElapsedMilliseconds()
Debug "Read time: " + Str(read_time - start_time) + " ms"
For i = 0 To lines - 1
Debug arr(i)
Next
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 12:30 pm
by mk-soft
@NicTheQuick
With array I enlarge it by, for example, by 100 elements and at the end to the right size.
This is faster than always calling ReDim.
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 12:56 pm
by NicTheQuick
mk-soft wrote: Sun Mar 23, 2025 12:30 pm
@NicTheQuick
With array I enlarge it by, for example, by 100 elements and at the end to the right size.
This is faster than always calling ReDim.
I don't quite understand what you mean. How do you enlarge an array without ReDim?
My point is: This technique is also used in the `std::vector` class in C/C++ and for ArrayList in Java. The amortized time complexity with that is still O(1).
Here's the conclusion from ChatGPT
ChatGPT wrote:The doubling strategy is the most efficient way to handle dynamic arrays with unknown length, as it achieves O(1) amortized insertion time while keeping memory usage reasonable. Most modern programming languages and libraries use this technique in their built-in dynamic array implementations because it provides the best balance between performance and memory efficiency.
And here are some alternative strategies, also from ChatGPT
ChatGPT wrote:- Doubling (*2) → Most common, minimizes resizes while keeping O(1) amortized insertions.
- 1.5x Growth (*1.5) → Saves memory but increases the number of resizes slightly. Still O(1) amortized.
- Fixed-size increments (+10) → Inefficient; results in O(n) per insertion in the worst case.
Disclaimer: I mostly used ChatGPT for translating into English.
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 3:43 pm
by AZJIO
Code: Select all
EnableExplicit
Define i, ArrSz
Define NewList MyList()
For i = 0 To 1000
AddElement(MyList())
Next
ArrSz = 100
Define Dim Arr(ArrSz)
i = 0
ForEach MyList()
i + 1
If i > ArrSz
ArrSz * 2
ReDim Arr(ArrSz)
EndIf
Arr(i) = i
Next
ReDim Arr(i)
For i = 0 To 1000
Debug Arr(i)
Next
It's working slowly.
you need to memorize the position after LF and read from that position to the next LF and then memorize the position again and so on to the end.
Re: How to speed up array creation
Posted: Sun Mar 23, 2025 4:32 pm
by simkot
pjay wrote: Sun Mar 23, 2025 9:22 am
Maybe you'd be better off just keeping it simple & reading your file line by line:
Thank you! Your code works great!