Get Text From HTML
Posted: Fri Jan 24, 2020 11:22 am
Just playing around trying to get the text from an HTML file and came up with this:
It reads the file until <body> is found then removes anything enclosed by <> to return the text line by line.
Any improvements welcome.
CD
Code: Select all
Define MyString.s,ReturnString.s
Define WordCount.i,iLoop.i
Define ignore.i
Define BodyFound.i
ReadFile(0,"My Test.html") ;Your HTML File
While Not Eof(0)
;Ignore everything Until Body
While BodyFound = #False
If FindString(ReadString(0),"<body>",0,#PB_String_NoCase )
BodyFound = #True
Break
EndIf
Wend
MyString = ReadString(0)
Ignore = #False
For iLoop = 1 To Len(MyString)
If Mid(Mystring,iLoop,1) = "<"
Ignore = #True
EndIf
If ignore = #False
returnstring = returnstring + Mid(Mystring,iLoop,1)
EndIf
If Mid(Mystring,iLoop,1) = ">"
Ignore = #False
EndIf
Next
Debug ReturnString
ReturnString = ""
Wend
CloseFile(0)
End
Any improvements welcome.
CD