Page 1 of 1

Does anyone have or know of a good parser?

Posted: Wed Dec 05, 2012 3:22 am
by RichAlgeni
I used to work in a database language called Unidata, which was a variant of Pick. It was old and clumsy, but it had an automatic parser which was nice to use. A record could be accessed by it's fields, using numbers. For instance, if you wanted the 15th field of a record, you could use the statement 'record<15>'.

Now, I'm not looking for that sort of capability, but, in keeping with my philosophy of not reinventing the wheel if possible, I was wondering if anyone had or knew of a good parser? Maybe even a dll from Microsoft where you could pass a string, and a delimiter (preferably one character), and it would possibly return to you a linked list. Why would anyone might need this? There are a number of network sources that send comma delimited data that I need to utilize.

If no one knows of such a beast, I will probably create a dll to do this. Just FYI: I've gotten into the habit of never passing a string to a procedure, instead always passing a pointer.

Thanks as always!

Re: Does anyone have or know of a good parser?

Posted: Wed Dec 05, 2012 4:52 am
by jassing
What are you trying to parse?

If you use a database, you can issue a query to retrieve the nth Record and then grab the 15th field.

Re: Does anyone have or know of a good parser?

Posted: Wed Dec 05, 2012 6:41 am
by skywalk
Search for 'Split' or use PB's StringField() :?:
or as jassing mentioned, dump the entire csv to a SQLite mem table and execute a query.

Re: Does anyone have or know of a good parser?

Posted: Wed Dec 05, 2012 7:32 pm
by jassing
skywalk wrote:PB's StringField() :?:
Caution should be used when using StringField()
If you are parsing a csv, this is perfectly valid:

Given this line: "Male","Smith, John", "1180 main street"

=stringfield( c$, 3, ",")
will be: [ John"], not the street address.

Re: Does anyone have or know of a good parser?

Posted: Wed Dec 05, 2012 8:12 pm
by Michael Vogel
If StringField is a good start for simple parsing, you can also check the following functions:

Code: Select all

Procedure.s StringFieldPlus(input.s,startfield,delimiter.s,special.s=#DQUOTE$)

	Protected count.l=1
	Protected ret$ = ""
	Protected *p.Character = @input
	Protected delim.c = Asc(delimiter)
	Protected speci.c = Asc(special)

	While *p\c
		If *p\c = delim
			*p+SizeOf(Character)
			count + 1
		ElseIf *p\c = speci
			*p+SizeOf(Character)
			While *p\c And *p\c <> speci
				If count = startfield
					ret$ + Chr(*p\c)
				EndIf
				*p+SizeOf(Character)
			Wend
			If *p\c = speci
				*p+SizeOf(Character)
			EndIf
		Else
			If count = startfield
				ret$ + Chr(*p\c)
			EndIf
			*p+SizeOf(Character)
		EndIf
	Wend

	ProcedureReturn ret$

EndProcedure
Procedure.s Field(x.s,nr.l,delim.s=" ",quot.s="")

	Protected v.l=-1,b.l=0
	Protected flag.l=1
	Protected len=Len(x)
	Protected d.b=Asc(delim)
	Protected c.c

	While b<len
		c=PeekB(@x+b)
		;Debug Str(nr)+": "+Str(v)+", "+Str(b)+" ("+Chr(c)+")"
		If FindString(delim,Chr(c),1)=0
			If FindString(quot,Chr(c),1)
				flag=1-flag
			EndIf
		ElseIf flag
			nr-1
			Select nr
			Case 1
				v=b
			Case 0
				ProcedureReturn Mid(x,v+2,b-v-1)
			EndSelect
		EndIf
		b+1
	Wend
	If nr>1
		ProcedureReturn ""
	Else
		ProcedureReturn Mid(x,v+2,#MAXSHORT)
	EndIf
EndProcedure

t.s=#DQUOTE$+"Male"+#DQUOTE$+","+#DQUOTE$+"Smith, John"+#DQUOTE$+", "+#DQUOTE$+"1180 main street"+#DQUOTE$
delimiter.s=","

Debug "Parse the following string ["+t+"] :"

Debug "-------------------- Default ------------------"
For i=0 To 7
	Debug Str(i)+": ["+field(t,i,delimiter)+"]"
Next i

Debug "----------------- Enhanced -----------------"
For i=0 To 5
	Debug Str(i)+": ["+field(t,i,delimiter,#DQUOTE$)+"]"
Next i


Debug "----------------- Alternative -----------------"
For i=0 To 5
	Debug Str(i)+": ["+StringFieldPlus(t,i,delimiter,#DQUOTE$)+"]"
Next i

Debug "----------------------------------------------------"

Re: Does anyone have or know of a good parser?

Posted: Wed Dec 05, 2012 10:24 pm
by RichAlgeni
All excellent posts, thanks all! Most of what I am parsing is data delimited by a single character, coming in over a network connection. Some of it is delimited by cr-lf.

Understood about StringField(), and the pitfalls of using delimiter characters such as a comma. Whenever I create such data, I use a non-printable character as a delimiter. That way I can search and change any delimiter character that ends up in the data string. Then I change the non-printable to the delimiter needed.

Forgive me, my question was not so much how to accomplish it, as it should be something any moderately experienced programmer can do. But I can't help thinking that there isn't a function in a dll somewhere, especially in Microsoft code that does exactly what I am looking for. In fact, I'm sure there is. I'm sure somewhere buried in Microsoft dll's there are functions that can do a lot of what we all look for on a daily basis. It's just that it wasn't a priority at Microsoft to better document their dll's. It would have been nice had they added a description, and a list of any offsets for parameters passed. True, there is some good information the the Windows development API, it's just IMHO, it could be a lot better.