Large files , NewList vs Array ?
-
- User
- Posts: 23
- Joined: Thu Apr 18, 2024 9:12 pm
Large files , NewList vs Array ?
Thanks for your Help:
I have a 'NewList WordsList.s()'
into which I am reading in .txt file .
So far my largest file is 300kb , 4,540 lines .
At a Button Click WordsList.s() can be re-sorted
into Ascending or Descending order .
The Program can be quite slow , sometimes getting a "Not Responding" message .
So I am wondering an Array might be faster ?
I see this Example in the Help .
Dim MyArray(41)
MyArray(0) = 1
MyArray(1) = 2
What would I have to code for 300,000 lines ?
Thanks for your Help...
I have a 'NewList WordsList.s()'
into which I am reading in .txt file .
So far my largest file is 300kb , 4,540 lines .
At a Button Click WordsList.s() can be re-sorted
into Ascending or Descending order .
The Program can be quite slow , sometimes getting a "Not Responding" message .
So I am wondering an Array might be faster ?
I see this Example in the Help .
Dim MyArray(41)
MyArray(0) = 1
MyArray(1) = 2
What would I have to code for 300,000 lines ?
Thanks for your Help...
Last edited by millie78526 on Sun May 05, 2024 9:31 pm, edited 1 time in total.
Re: Large files , NewList vs Array ?
No code no help.
But when I look in my crystal ball, I see that you are blocking the main event loop.
One solution would be a thread.
But as written above: no code, no help possible, because my crystall ball is restricted.
But when I look in my crystal ball, I see that you are blocking the main event loop.
One solution would be a thread.
But as written above: no code, no help possible, because my crystall ball is restricted.
Re: Large files , NewList vs Array ?
Faster in what? Sorting? Adding the all the words?
millie78526 wrote: Sun May 05, 2024 9:00 pm I see this Example in the Help .
Dim MyArray(41)
MyArray(0) = 1
MyArray(1) = 2
What would I have to code for 300,000 lines ?
Code: Select all
Dim MyArray(299999)
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
-
- User
- Posts: 23
- Joined: Thu Apr 18, 2024 9:12 pm
Re: Large files , NewList vs Array ?
Sorry I was speaking generically , is Array faster than NewList .
Program starts here:
Then Procedure Open_Different_File()
Reads File into Input_Editor EditorGadget .
When I get all the parts working properly
I plan to read File into Array|NewList.
Here is source of my Question , 'Array|NewList?' .
Here is the Program Screen-Shot :
https://vmars.us/ShowMe/Study-Up-Screen-Shot.png
Program starts here:
Code: Select all
Procedure StartProgram()
Debug "Procedure StartProgram()"
SetGadgetText(StartButton , "Working....")
Delay(500)
MessageRequester("Still TO DO" , "1) Stop any 'Clear Address_String' " +
"2) Delete any numeric only in Words_List" )
Open_Different_File()
Create_Words_List()
Sort_Words_List()
Remove_Words_List_Duplicates()
SetGadgetText(StartButton , "Start")
EndProcedure ; StartProgram()
Reads File into Input_Editor EditorGadget .
When I get all the parts working properly
I plan to read File into Array|NewList.
Code: Select all
Procedure Open_Different_File()
Debug "Procedure Open_Different_File()"
NumberOfLines = 0
SeqNumber = 1
SetGadgetText(NumberOfLinez , " " )
ClearOutputFile()
ClearGadgetItems(Index_List)
fileName$ = OpenFileRequester("Select a file to open", CurrentDIR$ , "All files (*.*)", 0)
If fileName$ <> " " And FileName$ <> ""
; Check if the file can be opened
If OpenFile(0, fileName$)
; Clear the EditorGadget before loading new content
ClearGadgetItems(Input_Editor)
SetGadgetText(StartButton , "Working....")
Delay(500)
While Not Eof(0)
FileLine$ = ReadString(0)
; Append each line to the EditorGadget, add a newline character for each new line
; SetGadgetText(0, GetGadgetText(0) + FileLine$ ) ; + Chr(13) + Chr(10))
AddGadgetItem(Input_Editor,-1, Str(SeqNumber) + " " + FileLine$ ) ; "
SeqNumber = SeqNumber + 1
NumberOfLines = NumberOfLines + 1
Wend
; SetGadgetText(NumberOfLinez , Str(SeqNumber - 1) )
CloseFile(0)
Else
MessageRequester("Error", "Failed to open file: " + fileName$ , #PB_MessageRequester_Error)
EndIf
EndIf
SetGadgetText(StartButton , "Start")
SetGadgetText(Address_String , "")
SetGadgetText(Address_String , fileName$ )
NumberOfLines$ = Str(NumberOfLines)
SetGadgetText(NumberOfLinez , "Number of Input Lines = " + NumberOfLines$ )
; Create_Words_List()
; Sort_Words_List()
; Remove_Words_List_Duplicates()
EndProcedure ; Open_Different_File()
Here is source of my Question , 'Array|NewList?' .
Code: Select all
Procedure Create_Words_List()
Debug "Procedure Create_Words_List()"
Global EventID ,
WordsListCOUNT = 0 , Number_Times_Thru = 0 ,
aLine$ , POS = 0 , StartPOS= 0 , LastPOD = 0 ,
Editor_Current_Line_Number = 0
; ============================================================
; Extract lines from Input_Editor and Populate NewList WordsList.s() line-by-line
; ============================================================
; If Number_Times_Thru > 0 : Goto ReSort_WordList : EndIf
Input_Editor_Line_Count = CountGadgetItems(Input_Editor)
; Number_Times_Thru = Number_Times_Thru + 1
ClearList(WordsList())
If Input_Editor_Line_Count > 0
Global aLine$ , aLineLEN = 0 , BlankPOS = 0 , StartPOS= 1 , LastPOD = 0 ,
StartMid = 0 , BlankPOS = 0 , MidWord$ , Input_Editor_Line_Count = 0 ,
LastWord$ , LastWordLEN = 0 , RepeatCount = 0 ,
Input_Editor_Lines_Count = 0
RepeatCount = 0 : Input_Editor_Lines_Count = CountGadgetItems(Input_Editor)
Repeat ; for each line
BlankPOS = 0 : StartPOS= 1
aLine$ = GetGadgetItemText(Input_Editor, RepeatCount)
RepeatCount = RepeatCount + 1
aLineLEN = Len(aLine$)
;============================================================================
; CreateRegularExpression(0, "^[0-9 .]+$") ; First Try "^[0-9]+$"
If MatchRegularExpression(0, aLine$)
; Debug "RegularExpressionError() = " + RegularExpressionError()
Debug "SkipThisLine = " + aLine$
Goto SkipThisLine
EndIf
;============================================================================
; Debug "This Line Not Numeric = " + aLine$
; MatchRegularExpression()
; parse/collect Words_List
Repeat ; for each word
BlankPOS = FindString(aLine$, " " , StartPOS , #PB_String_NoCase)
If BlankPOS > 0
MidWord$ = Mid(aLine$, StartPOS , BlankPOS - StartPOS + 1 )
Trimmed_MidWords$ = RTrim(MidWord$ , " ")
AddElement(WordsList())
WordsList() = Trimmed_MidWords$
WordsListCOUNT = WordsListCOUNT + 1
StartPOS = BlankPOS + 1
EndIf
Until BlankPOS < 1 ;
; ============================================================
If BlankPOS < 1 ; There are no more Blanks/Spaces EndOfLine
LastWordLEN = aLineLEN - StartPOS + 1
LastWord$ = Mid(aLine$, StartPOS , LastWordLEN + 1)
AddElement(WordsList())
WordsList() = LastWord$
EndIf ; If BlankPOS < 1
SkipThisLine:
Until RepeatCount = Input_Editor_Lines_Count ; Repeat ; for each line
EndIf
EndProcedure ; Create_Words_List()
;
Procedure Sort_Words_list()
Debug "Procedure Sort_Words_list()"
; ============================================================
Descend_State = GetGadgetState(Descend_CheckBox) ; #PB_Checkbox_Checked
If Descend_State = #PB_Checkbox_Checked
SortList(WordsList() , #PB_Sort_Descending | #PB_Sort_NoCase)
SortInfo$ = "Sorted in Descending order:"
EndIf
; ============================================================
Ascend_State = GetGadgetState(Ascend_CheckBox) ; #PB_Checkbox_Checked
If Ascend_State = #PB_Checkbox_Checked
SortList(WordsList() , #PB_Sort_Ascending | #PB_Sort_NoCase)
SortInfo$ = "Sorted in Ascending order:"
EndIf
; ============================================================
ClearGadgetItems(Index_List)
ForEach WordsList()
AddGadgetItem(Index_List , -1 , WordsList() )
Next
EndProcedure ; Sort_Words_list()
https://vmars.us/ShowMe/Study-Up-Screen-Shot.png
Re: Large files , NewList vs Array ?
The sorting will be relatively fast, it's adding the word list to the ListView that's slowing you down here.
If you're on Windows & you're looking to add 300,000+ items to a gadget, then you should look at alternative methods, such as a VirtualListIcon (search the forum) - it's a little more complex with a huge performance payoff.
Edit - Added example (Windows only):
If you're on Windows & you're looking to add 300,000+ items to a gadget, then you should look at alternative methods, such as a VirtualListIcon (search the forum) - it's a little more complex with a huge performance payoff.
Edit - Added example (Windows only):
Code: Select all
; virtual list icon example with simulated word data - PJ 2024.
EnableExplicit
Enumeration ; windows
#myWin
EndEnumeration
Enumeration ; gadgets
#myGad_SortCombo
#myGad_ListView
#myGad_List_Update
#myGad_Messages
EndEnumeration
#Wordcount = 300000
Global Dim wordArray.s(#wordcount - 1)
Macro messageAdd(txt) : AddGadgetItem(#myGad_Messages,-1,txt) : EndMacro
Procedure WinCallback(hwnd, msg, wParam, lParam)
Protected result = #PB_ProcessPureBasicEvents
Select msg
Case #WM_NOTIFY
Define *pnmh.NMHDR = lParam
Select *pnmh\code
Case #LVN_FIRST - 13 : result = 0
Case #LVN_GETDISPINFO
Define *pnmlvdi.NMLVDISPINFO = lParam
If *pnmlvdi\item\mask & #LVIF_TEXT
*pnmlvdi\item\pszText = @wordArray(*pnmlvdi\item\iItem)
EndIf
EndSelect
EndSelect
ProcedureReturn result
EndProcedure
Procedure WordArray_Create()
Protected myLoop, wordLen, word.s, onChar, *s.ascii : RandomSeed(0)
For myLoop = 0 To #Wordcount - 1
wordLen = Random(7)+3 : wordArray(myLoop) = Space(wordLen) : *s = @wordArray(myLoop)
For onChar = 1 To wordLen : *s\a = Random(25)+97 : *s + SizeOf(character) : Next
Next
messageAdd("Words in array: "+Str(#Wordcount))
EndProcedure
Procedure Populate_ListView_Array(gadget,Array wordArr.s(1), Sort = 2)
Protected count = ArraySize(wordArr(),1), myLoop, timePop, timeSort
If IsGadget(gadget)
If GadgetType(gadget) = #PB_GadgetType_ListIcon
; sort the array?
timeSort = ElapsedMilliseconds()
Select Sort
Case #PB_Sort_Ascending : SortArray(wordArr(),#PB_Sort_Ascending)
Case #PB_Sort_Descending : SortArray(wordArr(),#PB_Sort_Descending)
EndSelect
timeSort = ElapsedMilliseconds() - timeSort
; populate the gadget with word array
timePop = ElapsedMilliseconds()
SendMessage_(GadgetID(gadget), #LVM_SETITEMCOUNT, 0, 1)
SendMessage_(GadgetID(gadget), #LVM_SETITEMCOUNT, Count, 1)
timePop = ElapsedMilliseconds() - timePop
messageAdd("Time to sort: "+Str(timeSort)+"ms, Time to populate: "+Str(timePop)+"ms")
EndIf
EndIf
EndProcedure
Procedure Init_Main()
OpenWindow(#myWin,0,0,520,480,"List sort",#PB_Window_ScreenCentered|#PB_Window_SystemMenu)
SetWindowCallback(@WinCallback())
LoadFont(0,"Segoe UI",12) : SetGadgetFont(#PB_Default,FontID(0))
ComboBoxGadget(#myGad_SortCombo,2,2,166,30)
AddGadgetItem(#myGad_SortCombo,0,"Sort: Ascending")
AddGadgetItem(#myGad_SortCombo,1,"Sort: Descending")
AddGadgetItem(#myGad_SortCombo,2,"Sort: None") : SetGadgetState(#myGad_SortCombo,2)
ButtonGadget(#myGad_List_Update,2,40,166,30,"Populate now...")
EditorGadget(#myGad_Messages,170,2,348,476)
ListIconGadget(#myGad_ListView,2,80,166,398,"",150,#LVS_OWNERDATA|#LVS_NOCOLUMNHEADER)
WordArray_Create()
EndProcedure
Init_Main()
Repeat
Select WaitWindowEvent()
Case #PB_Event_CloseWindow : End
Case #PB_Event_Gadget
Select EventGadget()
Case #myGad_List_Update
Populate_ListView_Array(#myGad_ListView, wordArray(),GetGadgetState(#myGad_SortCombo))
EndSelect
EndSelect
ForEver
Re: Large files , NewList vs Array ?
Add
before filling the list and
after it and see if that's fast enough.
Code: Select all
SendMessage_(GadgetID(Index_List),#WM_SETREDRAW,0,0)
Code: Select all
SendMessage_(GadgetID(Index_List),#WM_SETREDRAW,1,0)
Good morning, that's a nice tnetennba!
PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
- NicTheQuick
- Addict
- Posts: 1504
- Joined: Sun Jun 22, 2003 7:43 pm
- Location: Germany, Saarbrücken
- Contact:
Re: Large files , NewList vs Array ?
You should also consider removing the 'Delay(500)'.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
-
- User
- Posts: 23
- Joined: Thu Apr 18, 2024 9:12 pm
Re: Large files , NewList vs Array ?
Hi NickTheQuick ;
Yes , processing a large file can take a while ,
I do that to give the user notice that something in happening .
If I do the MessageRequester without a Delay , it doesn't show right away .
Yes , processing a large file can take a while ,
I do that to give the user notice that something in happening .
If I do the MessageRequester without a Delay , it doesn't show right away .
-
- User
- Posts: 23
- Joined: Thu Apr 18, 2024 9:12 pm
Re: Large files , NewList vs Array ?
Thanks guys ,I appreciate your Help...
https://vmars.us/freeware/Study-Up/Study-Up-HELP.html
Thanks Grampa for letting me Post it on your sites
https://vmars.us/freeware/Study-Up/Study-Up-HELP.html
Thanks Grampa for letting me Post it on your sites

- DeanH
- Enthusiast
- Posts: 274
- Joined: Wed May 07, 2008 4:57 am
- Location: Adelaide, South Australia
- Contact:
Re: Large files , NewList vs Array ?
My 2c here. I've worked a lot with very large lists. They're a big part of my commercial software. Author authority lists can potentially have several hundred thousand entries loaded from a database table into a ListView. In my experience, filling arrays is faster than lists. Lists are more flexible and I don't have to worry about a size limit. I use lists when speed isn't critical, and arrays when it is. That's me. Others may disagree. It may vary a great deal with user hardware, networking, etc.
Filling ListView controls is slow. For a large list, I resort to disabling the control before filling, and even hiding it, then enabling and showing after. Maybe have a progressbar so the user knows something is happening during the fill. If the list is auto-sorted, disable that when loading. I've found simply hiding the control can give a reasonable speed up. A virtual list is a nice trick but it is complex. I've seen other commercial systems resort to trickery and only show a small part of the list at a time instead of filling the whole control with all the data.
Jacdelad's suggestion helps, too.
WindowEvent() within a processing loop can show a progressbar quickly instead of later.
This has been my experience working with arrays, lists, and the ListView control.
Filling ListView controls is slow. For a large list, I resort to disabling the control before filling, and even hiding it, then enabling and showing after. Maybe have a progressbar so the user knows something is happening during the fill. If the list is auto-sorted, disable that when loading. I've found simply hiding the control can give a reasonable speed up. A virtual list is a nice trick but it is complex. I've seen other commercial systems resort to trickery and only show a small part of the list at a time instead of filling the whole control with all the data.
Jacdelad's suggestion helps, too.
WindowEvent() within a processing loop can show a progressbar quickly instead of later.
This has been my experience working with arrays, lists, and the ListView control.
Re: Large files , NewList vs Array ?
Use another notification method that is more economical and does not force the user to wait several times longer. Use the progressbar, you must divide the number of elements by 100, then you will know how much 1% is and add +1 to the progress after adding a given number of elements. The list works faster because it uses ForEach, where when listing elements there is a pointer to the next element. The array is intended for quick access by index. At the time of filling, you do not need random access to the random number of the list element, since they are listed sequentially. But overall the difference is not that big, you will hardly notice the difference in speed. Using WM_SETREDRAW will give you a noticeable difference since the OS will not redraw the ListIconGadget surface after each element is added. The speed is several times faster, for example 1 second versus 10 seconds, or 1 minute versus 10 minutes. I hope you understand how long you are making the user wait. You can put a time counter in your code and evaluate the speed of each code section.millie78526 wrote: Mon May 06, 2024 7:44 pm I do that to give the user notice that something in happening .