How does PureBasic parse files?

Just starting out? Need help? Post your questions and find answers here.
mao
User
User
Posts: 20
Joined: Wed May 27, 2015 2:27 am

How does PureBasic parse files?

Post by mao »

Given a file like this:
"abcd","abcdsssss"
"aaaaaaaabbs","ffffffffffffffff"
.......
1.098 2.754 3.777 5.777
1.999 3.777 4.567
2.888 4.675
1.897


This file is divided into two main parts:
The first part includes two sections separated by comma, these two sections can be any length within double quotes. I want to put the first section into an array and put the second into another array. I figured out how to do that but using pretty long codes. Could anyone possibly give some guidelines about how to get the output in several shorter lines?

The second part of the file is filled with numeric values with specific length and separated by space and every following line is shorter than by one value. I need to put these values into a matrix. I have no idea how PureBasic parses files as other languages do, like how to recognize delimiter and so on.

Could someone help me with this or recommend some links or sources about how PB parses files?
Thanks.
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: How does PureBasic parse files?

Post by Marc56us »

IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: How does PureBasic parse files?

Post by IdeasVacuum »

Well, it's how you parse files with the myriad of functions that PB provides.

Personally, I would use a Linked List for each data section, rather than an array for each, since with Lists you do not need to know how many data elements there are.

Read each file line as a string.

When reading, test the last char of the line - if it is not Chr(34) (speech), start adding data to your second List.

By the way, with a structured List, you do not need to have two Lists, you can read any data type into the list.

So, how you want to parse the file is dependant on how you will use the data later in the application.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
mao
User
User
Posts: 20
Joined: Wed May 27, 2015 2:27 am

Re: How does PureBasic parse files?

Post by mao »

Marc56us wrote:Parse string: StringField
http://www.purebasic.com/documentation/ ... field.html

Remove "" : Trim
http://www.purebasic.com/documentation/string/trim.html

Matrix:
See chapter Arrays, Lists & Structure in
http://www.purebasic.com/documentation/index.html

and more fun: Regular Expression
http://www.purebasic.com/documentation/ ... index.html
Regular expression is what I am looking for!
Thanks. :D
mao
User
User
Posts: 20
Joined: Wed May 27, 2015 2:27 am

Re: How does PureBasic parse files?

Post by mao »

IdeasVacuum wrote:Well, it's how you parse files with the myriad of functions that PB provides.

Personally, I would use a Linked List for each data section, rather than an array for each, since with Lists you do not need to know how many data elements there are.

Read each file line as a string.

When reading, test the last char of the line - if it is not Chr(34) (speech), start adding data to your second List.

By the way, with a structured List, you do not need to have two Lists, you can read any data type into the list.

So, how you want to parse the file is dependant on how you will use the data later in the application.
You are right. It is going to take me a while to choose the appropriate commands.
Wish I could handle these problems well as you do some day!
Thanks!
User avatar
Zebuddi123
Enthusiast
Enthusiast
Posts: 796
Joined: Wed Feb 01, 2012 3:30 pm
Location: Nottinghamshire UK
Contact:

Re: How does PureBasic parse files?

Post by Zebuddi123 »

Hi mao This parses the file as you have described and prints the results to the debug window. Its based on parsing the file till the separator is reached, whereby the sep variable is set to 1 and now all subsequent reads are placed in the numbers linkedlist of which is of type string so can be passed to the procedure SortData(). Just an example for you to look at

Zebuddi. :)

Code: Select all

; save this data in in file w.txt in same directory as the code, without the ;"

; "abcd","abcdsssss"
; "aaaaaaaabbs","ffffffffffffffff"
; .......
; 1.098 2.754 3.777 5.777
; 1.999 3.777 4.567
; 2.888 4.675
; 1.897




Global NewList string.s()
Global NewList numbers.s() 


Procedure SortData(List l.s(), delim$) ; takes a linkedlist of type .s  and delimiter for the stringfield() serperator
	commas=CountString(l.s(),delim$) ; counts the number of comma`s in the string
	If commas 
		For i=1 To commas+1 ; +1 parses to the end of the string
			Debug StringField(l.s(), i, delim$) 
		Next
	Else
		Debug l.s() ; only one element in the string
	EndIf
EndProcedure		


If ReadFile(0,"w.txt")
	sep=0 ; to show separator not yet reached
	While Not Eof(0)
		a$=ReadString(0,ReadStringFormat(0))  ; read in the current line in the correct format ascii/utf8 etc
		If FindString(a$, Chr(34)) ; looking for chr(44) which is quotemark "
			AddElement(string()) : string()=a$  ; add an element to the linkedlist and stores the string (strings identified by quotemarks" ")
		Else 
			If sep = 0 ; we have encountered the seperator
				a$=ReadString(0,ReadStringFormat(0)) ; ditch the seperator line and read the next line
				sep+1 ; set var that we have pasted the seperator line 
			EndIf	
			AddElement(numbers())  : numbers()=a$ ; add an element to the linkedlist and stores the string (numbers)
		EndIf	
	Wend
	
	CloseFile(0)

	
	ForEach string()
		SortData(string(), ",") ; loop through linkedlist and process the data
	Next
	
	ForEach numbers()
		SortData(numbers(), " ") ; loop through linkedlist and process the data
	Next
	
EndIf
malleo, caput, bang. Ego, comprehendunt in tempore
User avatar
TI-994A
Addict
Addict
Posts: 2741
Joined: Sat Feb 19, 2011 3:47 am
Location: Singapore
Contact:

Re: How does PureBasic parse files?

Post by TI-994A »

mao wrote:Given a file like this:
"abcd","abcdsssss"
"aaaaaaaabbs","ffffffffffffffff"
.......
1.098 2.754 3.777 5.777
1.999 3.777 4.567
2.888 4.675
1.897
Hi mao. Without some fixed structure, it can be tricky to handle files with mixed data types. Although the best approach would be to use custom-defined structures and read/write the file through the ReadData() and WriteData() functions, it wouldn't work with dynamic strings. As such, the next best approach would be to read and write the strings and numbers separately, albeit in a structured manner.

This example does just that, displaying the read and written values in list views:

Code: Select all

Global.s fileName, text, doubles.d

Procedure readSampleFile()
  If OpenFile(1, fileName)
    ClearGadgetItems(2)
    For r = 1 To 3
      AddGadgetItem(2, -1, "Record #" + Str(r))
      AddGadgetItem(2, -1, "=========")
      For i = 1 To 4
        AddGadgetItem(2, -1, ReadString(1))
      Next i
      For i = 1 To 10
        AddGadgetItem(2, -1, StrD(ReadDouble(1)))
      Next i
      AddGadgetItem(2, -1, "")
    Next r
    CloseFile(1)
    DisableGadget(4, 1)
  Else
    MessageRequester("File System", "Unable to read the created file.")
  EndIf
EndProcedure

Procedure writeSampleFile()
  fileName = OpenFileRequester("Select file name and location:", "", "", 0)
  If fileName And OpenFile(1, fileName)
    ClearGadgetItems(1)
    Restore sampleData:
    For r = 1 To 3
      AddGadgetItem(1, -1, "Record #" + Str(r))
      AddGadgetItem(1, -1, "=========")
      For i = 0 To 3
        Read.s text
        WriteStringN(1, text)
        AddGadgetItem(1, -1, text)
      Next i
      For i = 0 To 9
        Read.d doubles
        WriteDouble(1, doubles)
        AddGadgetItem(1, -1, StrD(doubles))
      Next i
      AddGadgetItem(1, -1, "")    
    Next r
    CloseFile(1)
    DisableGadget(3, 1)
    DisableGadget(4, 0)
  Else
    MessageRequester("File System", "Unable to create the file.")
  EndIf 
EndProcedure

wFlags = #PB_Window_SystemMenu | #PB_Window_ScreenCentered
OpenWindow(0, #PB_Any, #PB_Any, 430, 420, "Structured File Example", wFlags)
ListViewGadget(1, 10, 60, 200, 350)
ListViewGadget(2, 220, 60, 200, 350)
ButtonGadget(3, 10, 10, 200, 40, "WRITE FILE")
ButtonGadget(4, 220, 10, 200, 40, "READ FILE")
DisableGadget(4, 1)
DeleteFile("testFile.mao")   ;the sample file is deleted with every run

Repeat
  Select WaitWindowEvent()
    Case #PB_Event_CloseWindow
      appQuit = 1
    Case #PB_Event_Gadget
      Select EventGadget()
        Case 3
          writeSampleFile()
        Case 4
          readSampleFile()
      EndSelect
  EndSelect
Until appQuit = 1 

DataSection
  sampleData:
  Data.s "first abcd", "first abcdefgh", "first pqrstuvwxyz", "first lmnopqrstuvwxyz"
  Data.d  1.111111, 1.22222, 1.3333, 1.444444, 1.55555, 1.6666, 1.777777, 1.88888, 1.9999, 1.101
  Data.s "second abcd", "second abcdefgh", "second pqrstuvwxyz", "second lmnopqrstuvwxyz"
  Data.d  2.111111, 2.22222, 2.3333, 2.444444, 2.55555, 2.6666, 2.777777, 2.88888, 2.9999, 2.101
  Data.s "third abcd", "third abcdefgh", "third pqrstuvwxyz", "third lmnopqrstuvwxyz"
  Data.d  3.111111, 3.22222, 3.3333, 3.444444, 3.55555, 3.6666, 3.777777, 3.88888, 3.9999, 3.101
EndDataSection
This is only meant as a guide, to give you some idea.

Hope it helps. :D
Texas Instruments TI-99/4A Home Computer: the first home computer with a 16bit processor, crammed into an 8bit architecture. Great hardware - Poor design - Wonderful BASIC engine. And it could talk too! Please visit my YouTube Channel :D
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: How does PureBasic parse files?

Post by IdeasVacuum »

Another approach: browse to the file, parse it, display results on Window:

Code: Select all

EnableExplicit

Enumeration
#FileIO
#WinMain
#EdStrings
#EdMatrix
#TxtStrings
#TxtMatix
#BtnGetVals
EndEnumeration

Global NewList StringVals.s() ;strings
Global NewList MatrixVals.d() ;doubles

Procedure Msg(sMsg.s)
;--------------------
              MessageRequester("Problem", sMsg)
EndProcedure

Procedure ImportVals()
;---------------------
Protected          sPat.s = "Text File|*.txt;*.txt|All Files (*.*)|*.*;"
Protected iStringFormat.i = 0
Protected          sVal.s = "", sChar.s = "", sStringValDelim.s = Chr(44), sMatValDelim.s = Chr(32)
Protected    iTotalVals.i = 0, iIndex.i = 0

;Ask the User to browse to the data file
Protected     sDataFile.s = OpenFileRequester("Browse and select Data File", "C:\", sPat, 0)

ClearList(StringVals())
ClearList(MatrixVals())

              If(sDataFile)

                     If ReadFile(#FileIO, sDataFile)

                              iStringFormat = ReadStringFormat(#FileIO) ;e.g. file could be ASCII, Unicode etc

                              While(Eof(#FileIO) = 0) ;loop until end of file reached

                                          ;On read, remove any Prefix/Postfix space chars
                                          sVal = Trim(ReadString(#FileIO, iStringFormat)) 

                                         sChar = Right(sVal, 1) ;Is sVal a string?
                                      If sChar = Chr(34)
                                                                 ;Count the commas which delimit the Matrix values
                                                                 iTotalVals = CountString(sVal, sStringValDelim) 
                                              For iIndex = 1 To (iTotalVals + 1)

                                                   AddElement(StringVals())
                                                              StringVals() = Trim(StringField(sVal, iIndex, sStringValDelim), Chr(34))
                                              Next
                                      Else
                                              ;sVal contains Matrix Values (floats or doubles)
                                                                 ;Count the spaces which delimit the Matrix values
                                                                 iTotalVals = CountString(sVal, sMatValDelim) 
                                              For iIndex = 1 To (iTotalVals + 1)

                                                     AddElement(MatrixVals())
                                                                MatrixVals() = ValD(StringField(sVal, iIndex, sMatValDelim))
                                              Next
                                      EndIf
                              Wend
                     Else
                              Msg("Could not read data File: " + sDataFile)
                     EndIf
              Else
                     Msg("Data file was not selected")
              EndIf

              ClearGadgetItems(#EdStrings)

              ForEach StringVals()

                      AddGadgetItem(#EdStrings, -1, StringVals())
              Next

              ClearGadgetItems(#EdMatrix)

              ForEach MatrixVals()

                      AddGadgetItem(#EdMatrix, -1, StrD(MatrixVals(), 4))
              Next
EndProcedure

Procedure WinMain()
;------------------
Protected iflags.i = #PB_Window_SystemMenu|#PB_Window_ScreenCentered

              If OpenWindow(#WinMain, 0, 0, 520, 252, "Parse Data File",  iflags)

                        TextGadget(#TxtStrings, 10,   2, 500, 18, "String Vals")
                      EditorGadget(#EdStrings,  10,  20, 500, 84)
                        TextGadget(#TxtMatix,   10, 112, 500, 18, "Matrix Vals")
                      EditorGadget(#EdMatrix,   10, 130, 500, 84)
                      ButtonGadget(#BtnGetVals, 10, 222, 500, 26, "Browse to Data File")
              EndIf
EndProcedure

Procedure WaitForUser()
;----------------------
Protected iExit.i = #False

              Repeat
                       Select WaitWindowEvent(1)

                                Case #PB_Event_CloseWindow

                                       If EventWindow() = #WinMain: iExit = #True : EndIf

                                Case #PB_Event_Gadget

                                       Select EventGadget()

                                                 Case #BtnGetVals: ImportVals()
                                       EndSelect
                       EndSelect

              Until iExit = #True
EndProcedure

;Startup the App
    WinMain()
WaitForUser()

End
[/size]

Edit: Once you have pasted the code into PB, it is much easier to follow because of the syntax highlighting.
Last edited by IdeasVacuum on Wed May 27, 2015 8:27 pm, edited 5 times in total.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
Danilo
Addict
Addict
Posts: 3036
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Re: How does PureBasic parse files?

Post by Danilo »

mao
User
User
Posts: 20
Joined: Wed May 27, 2015 2:27 am

Re: How does PureBasic parse files?

Post by mao »

Thanks all!
I am just curious why it only takes 19 short-line code(two DO-LOOP and one FOR-NEXT loop) in QuickBasic to parse this file, but in PureBasic we have to write a much longer one. PureBasic is suposed to be more powerful than QB,right? :?
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: How does PureBasic parse files?

Post by IdeasVacuum »

Hello Mao

If you study everybody's examples, the parsing is done with very little code. All the extra lines of code are there to:

1) Make the code safe/durable for use with different files;
2) Define a Window to display the Results;
3) Allow any amount of files to be opened and the results displayed;
4) Show you the value of using Procedures in terms of making any amount of code more robust, easier to follow, easier to debug, easier to re-use in other projects.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: How does PureBasic parse files?

Post by IdeasVacuum »

...you can code really dirty and squeeze brilliant code into very few lines: Xmas Punch Contest
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
mao
User
User
Posts: 20
Joined: Wed May 27, 2015 2:27 am

Re: How does PureBasic parse files?

Post by mao »

IdeasVacuum wrote:Hello Mao

If you study everybody's examples, the parsing is done with very little code. All the extra lines of code are there to:

1) Make the code safe/durable for use with different files;
2) Define a Window to display the Results;
3) Allow any amount of files to be opened and the results displayed;
4) Show you the value of using Procedures in terms of making any amount of code more robust, easier to follow, easier to debug, easier to re-use in other projects.

Thanks! I will spend more time and slow down.
sancho2
User
User
Posts: 44
Joined: Wed Apr 15, 2015 5:14 am

Re: How does PureBasic parse files?

Post by sancho2 »

Zebuddi123 wrote:Its based on parsing the file till the separator is reached, whereby the sep variable is set to 1

Code: Select all

If ReadFile(0,"w.txt")
	sep=0 ; to show separator not yet reached
	While Not Eof(0)
		a$=ReadString(0,ReadStringFormat(0))  ; read in the current line in the correct format ascii/utf8 etc
		If FindString(a$, Chr(34)) ; looking for chr(44) which is quotemark "
			AddElement(string()) : string()=a$  ; add an element to the linkedlist and stores the string (strings identified by quotemarks" ")
		Else 
			If sep = 0 ; we have encountered the seperator
				a$=ReadString(0,ReadStringFormat(0)) ; ditch the seperator line and read the next line
				sep+1 ; set var that we have pasted the seperator line 
			EndIf	
			AddElement(numbers())  : numbers()=a$ ; add an element to the linkedlist and stores the string (numbers)
		EndIf	
	Wend
	
	CloseFile(0)

I like that your reading in the file and doing the parsing afterwards.
I know its for example purposes only...

I don't know if its any faster (or even needs to be) but it looks like testing only the first character of each line for a quote, could give you the satisfactory results:

Code: Select all

;if FindString(a$, Chr(34)) ; looking for chr(44) which is quotemark "
if left(a$, 1) = chr(34) 
Perhaps a second while loop when you reach the separation so as not to unnecessarily repeat findstring() when you know the quote char will no longer exist.

There was no mention of a separator line in the original post.
infratec
Always Here
Always Here
Posts: 7633
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: How does PureBasic parse files?

Post by infratec »

19 Lines (without tricks):

Code: Select all

If ReadFile(0, "test.txt")
  While Not Eof(0)
    Line$ = ReadString(0, #PB_Ascii)
    i = 1
    If Left(Line$, 1) = #DQUOTE$
      Delimiter$ = ","
    Else
      Delimiter$ = " "
    EndIf
    Repeat
      Part$ = StringField(Line$, i, Delimiter$)
      If Len(Part$)
        Debug Part$
      EndIf
      i + 1
    Until Len(Part$) = 0
  Wend
  CloseFile(0)
EndIf
And it runs on windows, linux and mac osx.
So PB is more powerfull than QB :mrgreen:

Bernd
Post Reply