Large files , NewList vs Array ?

Just starting out? Need help? Post your questions and find answers here.
millie78526
User
User
Posts: 23
Joined: Thu Apr 18, 2024 9:12 pm

Large files , NewList vs Array ?

Post by millie78526 »

Thanks for your Help:

I have a 'NewList WordsList.s()'
into which I am reading in .txt file .
So far my largest file is 300kb , 4,540 lines .
At a Button Click WordsList.s() can be re-sorted
into Ascending or Descending order .
The Program can be quite slow , sometimes getting a "Not Responding" message .
So I am wondering an Array might be faster ?

I see this Example in the Help .
Dim MyArray(41)
MyArray(0) = 1
MyArray(1) = 2

What would I have to code for 300,000 lines ?

Thanks for your Help...
Last edited by millie78526 on Sun May 05, 2024 9:31 pm, edited 1 time in total.
infratec
Always Here
Always Here
Posts: 7577
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Large files , NewList vs Array ?

Post by infratec »

No code no help.

But when I look in my crystal ball, I see that you are blocking the main event loop.
One solution would be a thread.

But as written above: no code, no help possible, because my crystall ball is restricted.
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Large files , NewList vs Array ?

Post by STARGÅTE »

millie78526 wrote: Sun May 05, 2024 9:00 pm So I am wondering an Array might be faster ?
Faster in what? Sorting? Adding the all the words?
millie78526 wrote: Sun May 05, 2024 9:00 pm I see this Example in the Help .
Dim MyArray(41)
MyArray(0) = 1
MyArray(1) = 2

What would I have to code for 300,000 lines ?

Code: Select all

Dim MyArray(299999)
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
millie78526
User
User
Posts: 23
Joined: Thu Apr 18, 2024 9:12 pm

Re: Large files , NewList vs Array ?

Post by millie78526 »

Sorry I was speaking generically , is Array faster than NewList .
Program starts here:

Code: Select all


Procedure StartProgram()
  Debug "Procedure StartProgram()"
  SetGadgetText(StartButton , "Working....")
  Delay(500)
  MessageRequester("Still TO DO" , "1) Stop any 'Clear Address_String' " + 
                                   "2) Delete any numeric only in Words_List"          )
  Open_Different_File()
      Create_Words_List()
      Sort_Words_List()
      Remove_Words_List_Duplicates()
  SetGadgetText(StartButton , "Start")
EndProcedure ; StartProgram()  
Then Procedure Open_Different_File()
Reads File into Input_Editor EditorGadget .
When I get all the parts working properly
I plan to read File into Array|NewList.

Code: Select all

Procedure Open_Different_File()
  Debug "Procedure Open_Different_File()"
  NumberOfLines = 0
  SeqNumber = 1  
  SetGadgetText(NumberOfLinez , " " )
  ClearOutputFile()
  ClearGadgetItems(Index_List)

    fileName$ = OpenFileRequester("Select a file to open", CurrentDIR$ , "All files (*.*)", 0)
    If fileName$ <> " " And  FileName$ <> ""
        ; Check if the file can be opened
        If OpenFile(0, fileName$)
            ; Clear the EditorGadget before loading new content
            ClearGadgetItems(Input_Editor) 
            SetGadgetText(StartButton , "Working....")
            Delay(500)
            While Not Eof(0)
                FileLine$ = ReadString(0)
                ; Append each line to the EditorGadget, add a newline character for each new line
;                SetGadgetText(0, GetGadgetText(0) + FileLine$ ) ; + Chr(13) + Chr(10)) 
                AddGadgetItem(Input_Editor,-1, Str(SeqNumber) + " " + FileLine$ )  ; " 
                SeqNumber = SeqNumber + 1
                NumberOfLines = NumberOfLines + 1
              Wend
;              SetGadgetText(NumberOfLinez , Str(SeqNumber - 1) )
            CloseFile(0)
        Else
            MessageRequester("Error", "Failed to open file: " + fileName$ , #PB_MessageRequester_Error)
        EndIf
      EndIf
      SetGadgetText(StartButton , "Start")
      SetGadgetText(Address_String , "")
      SetGadgetText(Address_String , fileName$ )
      NumberOfLines$ = Str(NumberOfLines)
      SetGadgetText(NumberOfLinez , "Number of Input Lines =  " + NumberOfLines$ )
;      Create_Words_List()
;      Sort_Words_List()
;      Remove_Words_List_Duplicates()
EndProcedure ; Open_Different_File()

Here is source of my Question , 'Array|NewList?' .

Code: Select all

Procedure Create_Words_List() 
  Debug "Procedure Create_Words_List()"
    Global  EventID , 
                 WordsListCOUNT = 0 , Number_Times_Thru = 0 ,
                 aLine$ , POS = 0 , StartPOS= 0 , LastPOD = 0 ,
                 Editor_Current_Line_Number = 0
; ============================================================
; Extract lines from Input_Editor and Populate NewList WordsList.s() line-by-line
; ============================================================
;  If Number_Times_Thru > 0  : Goto ReSort_WordList  : EndIf  
  Input_Editor_Line_Count = CountGadgetItems(Input_Editor)
;  Number_Times_Thru = Number_Times_Thru + 1
  ClearList(WordsList()) 

  If Input_Editor_Line_Count > 0
    
    Global aLine$ , aLineLEN = 0 , BlankPOS = 0 , StartPOS= 1 , LastPOD = 0 ,
           StartMid = 0 , BlankPOS = 0 , MidWord$ , Input_Editor_Line_Count = 0 ,
           LastWord$ , LastWordLEN = 0 , RepeatCount = 0 , 
           Input_Editor_Lines_Count = 0 
    
  RepeatCount = 0 : Input_Editor_Lines_Count = CountGadgetItems(Input_Editor)
  Repeat ; for each line
    BlankPOS = 0 : StartPOS= 1 
    aLine$ = GetGadgetItemText(Input_Editor, RepeatCount) 
    RepeatCount = RepeatCount + 1
    aLineLEN = Len(aLine$)
;============================================================================
;    CreateRegularExpression(0, "^[0-9 .]+$") ; First Try "^[0-9]+$" 
    If MatchRegularExpression(0, aLine$)  
;       Debug "RegularExpressionError() = " + RegularExpressionError()
       Debug  "SkipThisLine = " +  aLine$
        Goto SkipThisLine      
     EndIf
;============================================================================     
;       Debug  "This Line Not Numeric = " +  aLine$
; MatchRegularExpression()
; parse/collect Words_List  
    Repeat ; for each word 
      BlankPOS = FindString(aLine$, " " , StartPOS , #PB_String_NoCase)
      If BlankPOS > 0 
        MidWord$ = Mid(aLine$, StartPOS , BlankPOS - StartPOS + 1 )
        Trimmed_MidWords$ = RTrim(MidWord$ , " ")
        AddElement(WordsList())
        WordsList() = Trimmed_MidWords$
        WordsListCOUNT = WordsListCOUNT + 1
        StartPOS = BlankPOS + 1
        
      EndIf 
    Until BlankPOS < 1   ;   
; ============================================================
    If BlankPOS < 1  ;  There are no more Blanks/Spaces EndOfLine
      LastWordLEN = aLineLEN - StartPOS + 1
      LastWord$ = Mid(aLine$, StartPOS , LastWordLEN + 1)

        AddElement(WordsList())
        WordsList() = LastWord$
      
      EndIf ; If BlankPOS < 1 
SkipThisLine:      
    Until RepeatCount = Input_Editor_Lines_Count   ;  Repeat ; for each line
  EndIf 
EndProcedure ; Create_Words_List()
;
Procedure Sort_Words_list()
Debug "Procedure Sort_Words_list()"  
; ============================================================
  Descend_State = GetGadgetState(Descend_CheckBox) ; #PB_Checkbox_Checked
  If Descend_State = #PB_Checkbox_Checked
    SortList(WordsList() , #PB_Sort_Descending  | #PB_Sort_NoCase) 
    SortInfo$ = "Sorted in Descending order:"
   EndIf 
; ============================================================
  Ascend_State = GetGadgetState(Ascend_CheckBox) ; #PB_Checkbox_Checked
  If Ascend_State = #PB_Checkbox_Checked
    SortList(WordsList() , #PB_Sort_Ascending  | #PB_Sort_NoCase) 
    SortInfo$ = "Sorted in Ascending order:"
   EndIf 
; ============================================================
   ClearGadgetItems(Index_List)
    ForEach WordsList()
        AddGadgetItem(Index_List   , -1 , WordsList() ) 
    Next   
 EndProcedure ; Sort_Words_list()
Here is the Program Screen-Shot :
https://vmars.us/ShowMe/Study-Up-Screen-Shot.png
pjay
Enthusiast
Enthusiast
Posts: 251
Joined: Thu Mar 30, 2006 11:14 am

Re: Large files , NewList vs Array ?

Post by pjay »

The sorting will be relatively fast, it's adding the word list to the ListView that's slowing you down here.

If you're on Windows & you're looking to add 300,000+ items to a gadget, then you should look at alternative methods, such as a VirtualListIcon (search the forum) - it's a little more complex with a huge performance payoff.

Edit - Added example (Windows only):

Code: Select all

; virtual list icon example with simulated word data - PJ 2024.

EnableExplicit

Enumeration ; windows
  #myWin
EndEnumeration
Enumeration ; gadgets
  #myGad_SortCombo
  #myGad_ListView
  #myGad_List_Update
  #myGad_Messages
EndEnumeration
#Wordcount = 300000

Global Dim wordArray.s(#wordcount - 1)

Macro messageAdd(txt) : AddGadgetItem(#myGad_Messages,-1,txt) : EndMacro

Procedure WinCallback(hwnd, msg, wParam, lParam)
  Protected result = #PB_ProcessPureBasicEvents
  Select msg
    Case #WM_NOTIFY
      Define *pnmh.NMHDR = lParam
      Select *pnmh\code
        Case #LVN_FIRST - 13 : result = 0
        Case #LVN_GETDISPINFO
          Define *pnmlvdi.NMLVDISPINFO = lParam
          If *pnmlvdi\item\mask & #LVIF_TEXT
            *pnmlvdi\item\pszText = @wordArray(*pnmlvdi\item\iItem)
          EndIf         
      EndSelect
  EndSelect
  ProcedureReturn result
EndProcedure

Procedure WordArray_Create()
  Protected myLoop, wordLen, word.s, onChar, *s.ascii : RandomSeed(0)
  For myLoop = 0 To #Wordcount - 1
    wordLen = Random(7)+3 : wordArray(myLoop) = Space(wordLen) : *s = @wordArray(myLoop)
    For onChar = 1 To wordLen : *s\a = Random(25)+97 : *s + SizeOf(character) : Next
  Next
  messageAdd("Words in array: "+Str(#Wordcount))
EndProcedure

Procedure Populate_ListView_Array(gadget,Array wordArr.s(1), Sort = 2)
  Protected count = ArraySize(wordArr(),1), myLoop, timePop, timeSort
  If IsGadget(gadget)
    If GadgetType(gadget) = #PB_GadgetType_ListIcon
      
      ; sort the array?
      timeSort = ElapsedMilliseconds()
      Select Sort
        Case #PB_Sort_Ascending : SortArray(wordArr(),#PB_Sort_Ascending)
        Case #PB_Sort_Descending : SortArray(wordArr(),#PB_Sort_Descending)
      EndSelect
      timeSort = ElapsedMilliseconds() - timeSort

      ; populate the gadget with word array
      timePop = ElapsedMilliseconds()
      SendMessage_(GadgetID(gadget), #LVM_SETITEMCOUNT, 0, 1)     
      SendMessage_(GadgetID(gadget), #LVM_SETITEMCOUNT, Count, 1)     
      timePop = ElapsedMilliseconds() - timePop
      
      messageAdd("Time to sort: "+Str(timeSort)+"ms, Time to populate: "+Str(timePop)+"ms")
    EndIf
  EndIf
  
EndProcedure
Procedure Init_Main()
  OpenWindow(#myWin,0,0,520,480,"List sort",#PB_Window_ScreenCentered|#PB_Window_SystemMenu)
  SetWindowCallback(@WinCallback())
  LoadFont(0,"Segoe UI",12) : SetGadgetFont(#PB_Default,FontID(0))
  ComboBoxGadget(#myGad_SortCombo,2,2,166,30)
  AddGadgetItem(#myGad_SortCombo,0,"Sort: Ascending")
  AddGadgetItem(#myGad_SortCombo,1,"Sort: Descending")
  AddGadgetItem(#myGad_SortCombo,2,"Sort: None") : SetGadgetState(#myGad_SortCombo,2)
  ButtonGadget(#myGad_List_Update,2,40,166,30,"Populate now...")
  EditorGadget(#myGad_Messages,170,2,348,476)
  ListIconGadget(#myGad_ListView,2,80,166,398,"",150,#LVS_OWNERDATA|#LVS_NOCOLUMNHEADER)
  
  WordArray_Create()
EndProcedure
Init_Main()

Repeat
  Select WaitWindowEvent()
    Case #PB_Event_CloseWindow : End
    Case #PB_Event_Gadget
      Select EventGadget()
        Case #myGad_List_Update
          Populate_ListView_Array(#myGad_ListView, wordArray(),GetGadgetState(#myGad_SortCombo))
      EndSelect
  EndSelect
ForEver
User avatar
jacdelad
Addict
Addict
Posts: 1992
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: Large files , NewList vs Array ?

Post by jacdelad »

Add

Code: Select all

SendMessage_(GadgetID(Index_List),#WM_SETREDRAW,0,0)
before filling the list and

Code: Select all

SendMessage_(GadgetID(Index_List),#WM_SETREDRAW,1,0)
after it and see if that's fast enough.
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Large files , NewList vs Array ?

Post by NicTheQuick »

You should also consider removing the 'Delay(500)'.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
millie78526
User
User
Posts: 23
Joined: Thu Apr 18, 2024 9:12 pm

Re: Large files , NewList vs Array ?

Post by millie78526 »

Hi NickTheQuick ;
Yes , processing a large file can take a while ,
I do that to give the user notice that something in happening .
If I do the MessageRequester without a Delay , it doesn't show right away .
millie78526
User
User
Posts: 23
Joined: Thu Apr 18, 2024 9:12 pm

Re: Large files , NewList vs Array ?

Post by millie78526 »

Thanks guys ,I appreciate your Help...

https://vmars.us/freeware/Study-Up/Study-Up-HELP.html


Thanks Grampa for letting me Post it on your sites :)
User avatar
DeanH
Enthusiast
Enthusiast
Posts: 274
Joined: Wed May 07, 2008 4:57 am
Location: Adelaide, South Australia
Contact:

Re: Large files , NewList vs Array ?

Post by DeanH »

My 2c here. I've worked a lot with very large lists. They're a big part of my commercial software. Author authority lists can potentially have several hundred thousand entries loaded from a database table into a ListView. In my experience, filling arrays is faster than lists. Lists are more flexible and I don't have to worry about a size limit. I use lists when speed isn't critical, and arrays when it is. That's me. Others may disagree. It may vary a great deal with user hardware, networking, etc.

Filling ListView controls is slow. For a large list, I resort to disabling the control before filling, and even hiding it, then enabling and showing after. Maybe have a progressbar so the user knows something is happening during the fill. If the list is auto-sorted, disable that when loading. I've found simply hiding the control can give a reasonable speed up. A virtual list is a nice trick but it is complex. I've seen other commercial systems resort to trickery and only show a small part of the list at a time instead of filling the whole control with all the data.

Jacdelad's suggestion helps, too.

WindowEvent() within a processing loop can show a progressbar quickly instead of later.

This has been my experience working with arrays, lists, and the ListView control.
AZJIO
Addict
Addict
Posts: 2143
Joined: Sun May 14, 2017 1:48 am

Re: Large files , NewList vs Array ?

Post by AZJIO »

millie78526 wrote: Mon May 06, 2024 7:44 pm I do that to give the user notice that something in happening .
Use another notification method that is more economical and does not force the user to wait several times longer. Use the progressbar, you must divide the number of elements by 100, then you will know how much 1% is and add +1 to the progress after adding a given number of elements. The list works faster because it uses ForEach, where when listing elements there is a pointer to the next element. The array is intended for quick access by index. At the time of filling, you do not need random access to the random number of the list element, since they are listed sequentially. But overall the difference is not that big, you will hardly notice the difference in speed. Using WM_SETREDRAW will give you a noticeable difference since the OS will not redraw the ListIconGadget surface after each element is added. The speed is several times faster, for example 1 second versus 10 seconds, or 1 minute versus 10 minutes. I hope you understand how long you are making the user wait. You can put a time counter in your code and evaluate the speed of each code section.
Post Reply