Does anyone have or know of a good parser?

Everything else that doesn't fall into one of the other PB categories.
User avatar
RichAlgeni
Addict
Addict
Posts: 935
Joined: Wed Sep 22, 2010 1:50 am
Location: Bradenton, FL

Does anyone have or know of a good parser?

Post by RichAlgeni »

I used to work in a database language called Unidata, which was a variant of Pick. It was old and clumsy, but it had an automatic parser which was nice to use. A record could be accessed by it's fields, using numbers. For instance, if you wanted the 15th field of a record, you could use the statement 'record<15>'.

Now, I'm not looking for that sort of capability, but, in keeping with my philosophy of not reinventing the wheel if possible, I was wondering if anyone had or knew of a good parser? Maybe even a dll from Microsoft where you could pass a string, and a delimiter (preferably one character), and it would possibly return to you a linked list. Why would anyone might need this? There are a number of network sources that send comma delimited data that I need to utilize.

If no one knows of such a beast, I will probably create a dll to do this. Just FYI: I've gotten into the habit of never passing a string to a procedure, instead always passing a pointer.

Thanks as always!
jassing
Addict
Addict
Posts: 1885
Joined: Wed Feb 17, 2010 12:00 am

Re: Does anyone have or know of a good parser?

Post by jassing »

What are you trying to parse?

If you use a database, you can issue a query to retrieve the nth Record and then grab the 15th field.
User avatar
skywalk
Addict
Addict
Posts: 4316
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Does anyone have or know of a good parser?

Post by skywalk »

Search for 'Split' or use PB's StringField() :?:
or as jassing mentioned, dump the entire csv to a SQLite mem table and execute a query.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
jassing
Addict
Addict
Posts: 1885
Joined: Wed Feb 17, 2010 12:00 am

Re: Does anyone have or know of a good parser?

Post by jassing »

skywalk wrote:PB's StringField() :?:
Caution should be used when using StringField()
If you are parsing a csv, this is perfectly valid:

Given this line: "Male","Smith, John", "1180 main street"

=stringfield( c$, 3, ",")
will be: [ John"], not the street address.
User avatar
Michael Vogel
Addict
Addict
Posts: 2867
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Does anyone have or know of a good parser?

Post by Michael Vogel »

If StringField is a good start for simple parsing, you can also check the following functions:

Code: Select all

Procedure.s StringFieldPlus(input.s,startfield,delimiter.s,special.s=#DQUOTE$)

	Protected count.l=1
	Protected ret$ = ""
	Protected *p.Character = @input
	Protected delim.c = Asc(delimiter)
	Protected speci.c = Asc(special)

	While *p\c
		If *p\c = delim
			*p+SizeOf(Character)
			count + 1
		ElseIf *p\c = speci
			*p+SizeOf(Character)
			While *p\c And *p\c <> speci
				If count = startfield
					ret$ + Chr(*p\c)
				EndIf
				*p+SizeOf(Character)
			Wend
			If *p\c = speci
				*p+SizeOf(Character)
			EndIf
		Else
			If count = startfield
				ret$ + Chr(*p\c)
			EndIf
			*p+SizeOf(Character)
		EndIf
	Wend

	ProcedureReturn ret$

EndProcedure
Procedure.s Field(x.s,nr.l,delim.s=" ",quot.s="")

	Protected v.l=-1,b.l=0
	Protected flag.l=1
	Protected len=Len(x)
	Protected d.b=Asc(delim)
	Protected c.c

	While b<len
		c=PeekB(@x+b)
		;Debug Str(nr)+": "+Str(v)+", "+Str(b)+" ("+Chr(c)+")"
		If FindString(delim,Chr(c),1)=0
			If FindString(quot,Chr(c),1)
				flag=1-flag
			EndIf
		ElseIf flag
			nr-1
			Select nr
			Case 1
				v=b
			Case 0
				ProcedureReturn Mid(x,v+2,b-v-1)
			EndSelect
		EndIf
		b+1
	Wend
	If nr>1
		ProcedureReturn ""
	Else
		ProcedureReturn Mid(x,v+2,#MAXSHORT)
	EndIf
EndProcedure

t.s=#DQUOTE$+"Male"+#DQUOTE$+","+#DQUOTE$+"Smith, John"+#DQUOTE$+", "+#DQUOTE$+"1180 main street"+#DQUOTE$
delimiter.s=","

Debug "Parse the following string ["+t+"] :"

Debug "-------------------- Default ------------------"
For i=0 To 7
	Debug Str(i)+": ["+field(t,i,delimiter)+"]"
Next i

Debug "----------------- Enhanced -----------------"
For i=0 To 5
	Debug Str(i)+": ["+field(t,i,delimiter,#DQUOTE$)+"]"
Next i


Debug "----------------- Alternative -----------------"
For i=0 To 5
	Debug Str(i)+": ["+StringFieldPlus(t,i,delimiter,#DQUOTE$)+"]"
Next i

Debug "----------------------------------------------------"
User avatar
RichAlgeni
Addict
Addict
Posts: 935
Joined: Wed Sep 22, 2010 1:50 am
Location: Bradenton, FL

Re: Does anyone have or know of a good parser?

Post by RichAlgeni »

All excellent posts, thanks all! Most of what I am parsing is data delimited by a single character, coming in over a network connection. Some of it is delimited by cr-lf.

Understood about StringField(), and the pitfalls of using delimiter characters such as a comma. Whenever I create such data, I use a non-printable character as a delimiter. That way I can search and change any delimiter character that ends up in the data string. Then I change the non-printable to the delimiter needed.

Forgive me, my question was not so much how to accomplish it, as it should be something any moderately experienced programmer can do. But I can't help thinking that there isn't a function in a dll somewhere, especially in Microsoft code that does exactly what I am looking for. In fact, I'm sure there is. I'm sure somewhere buried in Microsoft dll's there are functions that can do a lot of what we all look for on a daily basis. It's just that it wasn't a priority at Microsoft to better document their dll's. It would have been nice had they added a description, and a list of any offsets for parameters passed. True, there is some good information the the Windows development API, it's just IMHO, it could be a lot better.
Post Reply