Parser line (?)

Just starting out? Need help? Post your questions and find answers here.
AZJIO
Addict
Addict
Posts: 2141
Joined: Sun May 14, 2017 1:48 am

Parser line (?)

Post by AZJIO »

Why don't I understand? Linux Arch-x64, PB-6.04
Number.u = ReadUnicodeCharacter(#File)
Simplified example. In the debugger I get gobbledygook/gibberish but not the file symbols.

Code: Select all

EnableExplicit

Procedure GetString(InputFile$)
	Protected id_file, Format, Text$

	id_file = ReadFile(#PB_Any, InputFile$)
	If id_file
		Format = ReadStringFormat(id_file)
		While Not Eof(id_file)
			Debug Chr(ReadUnicodeCharacter(id_file))
		Wend
		CloseFile(id_file)
	EndIf
EndProcedure

Define path$ = GetPathPart(ProgramFilename()) + "1.pb"
If FileSize(path$) > 0
	GetString(path$)
Else
	Debug "not found"
EndIf
More real expectations

Code: Select all

EnableExplicit

Procedure GetString(InputFile$)
	Protected flgOpnQt, *mem, c, i, flgSemicolon
	Protected id_file, Format, Text$, *c.Unicode, *b.Unicode, *t.Unicode
	*mem = AllocateMemory(8000)
	*b = *mem

	id_file = ReadFile(#PB_Any, InputFile$)
	If id_file
		Format = ReadStringFormat(id_file)
		; 	Here you can read entirely in the original and put a pointer in 0
; 		sTextOrig$ = ReadString(id_file, Format | #PB_File_IgnoreEOL)
; 		FileSeek(id_file, 0)
; 		Format = ReadStringFormat(id_file) ; Once again, we move the pointer to the width of the BOM mark
		While Not Eof(id_file)
; 			Debug Chr(ReadUnicodeCharacter(id_file))
			*t\u = ReadUnicodeCharacter(id_file)
			If flgSemicolon
				If *t\u <> #CR Or *t\u <> #LF
					Continue ; When the quotation marks opened, we are waiting for the end of the line, there is nothing more here.
				Else
					flgOpnQt = 0 ; flag reset
					flgSemicolon = 0 ; flag reset
				EndIf
			EndIf
			Select *t\u
				Case '"'
; 					Debug "opening or closing quote"
; 					Bool(Not flgOpnQt)
					If flgOpnQt
						flgOpnQt = 0
						Debug PeekS(*mem, *b - *mem) ; at least let's see if the result is correct before sending it to the map
					Else
						flgOpnQt = 1
						*b = *mem ; reset the filling buffer to write new data when opening a quote
					EndIf
				Case #CR, #LF
; 					Debug "resetting the open quote and fill buffer flag"
					flgOpnQt = 0 ; flag reset
					flgSemicolon = 0 ; flag reset
				Case ';'
; 					Debug "comments if there is no open quote"
					If flgOpnQt = 0 ; if there is no open quote, then comments go to the end of the line
						flgSemicolon = 1
					EndIf
				Case '~'
; 					Debug "skip line if there is no open quote"
					If flgOpnQt = 0 ; For now we ignore the sub-content of the tilde, as if it were a comment, the quotes here are already broken
						flgSemicolon = 1
					EndIf
				Default
; 					Debug "filling the buffer if a quote is open"
					If flgOpnQt
						*b\u = *t\u
						*b + SizeOf(Unicode)
					EndIf
			EndSelect
			
		Wend
		CloseFile(id_file)
	EndIf
	
	FreeMemory(*mem)
	ProcedureReturn
EndProcedure


Define path$ = GetPathPart(ProgramFilename()) + "1.pb"
If FileSize(path$) > 0
	GetString(path$)
Else
	Debug "not found"
EndIf
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Parser line (?)

Post by STARGÅTE »

The function ReadUnicodeCharacter() reads exactly 2 bytes from the file and interpret it as a number.
This is independent from the file format.
Usually your file is ASCII or UTF-8 encoded, which means 1 byte per character (ASCII) or multiple bytes per character (UTF-8) but not exact 2 byte per character.
So you actually cannot use ReadUnicodeCharacter() in the way to do it.

I think you should try ReadString(id_file, Format, 1) to read one character with respect to the file format.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
Post Reply