#PB_Web_PlainText for GetGadgetItemText()

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

#PB_Web_PlainText for GetGadgetItemText()

Post by MachineCode »

Can we have a #PB_Web_PlainText flag for GetGadgetItemText() to return the plain text of the web page? We've got a #PB_Web_HtmlCode so a plain text alternative would be great. And yes, I know we can use #PB_Web_SelectedText but that's different and too cumbersome, as we need to highlight all the text first and that's annoying for the user.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
TomS
Enthusiast
Enthusiast
Posts: 342
Joined: Sun Mar 18, 2007 2:26 pm
Location: Munich, Germany

Re: #PB_Web_PlainText for GetGadgetItemText()

Post by TomS »

I know it's a feature request but here's a possible solution.

There's no formatting at all in this example (not even <br> which would be easy to add).
But when it comes to Div's and Tables you're lost.

This code would also return any JS-Code there is in the body (but only if it's not encapsulated by CDATA).

Code: Select all

<body><script type="javascript">document.write('hello world<br>');</script>
How are you?</body>
Would indeed return "document.write('hello world'); How are you?"

Code: Select all

<body><script type="javascript">
<![CDATA[
	 document.write('hello world<br>');
]]>
</script>
How are you?</body>
Would return "How are you?"

Code: Select all

Procedure.s HTML2PlainText(input.s)
	Protected workString.s = Mid(input, FindString(input, "<body ", 1)) ;Start at the body-tag thus ignore all the styles and JS in the head-tag.
		
	Protected *c.Character 		= @workString
	Protected outside.i 			= #True
	Protected lt.i 				= Asc("<")
	Protected gt.i 				= Asc(">")
	
	Protected result.s
	
	While *c\c ! 0										
		Select *c\c 
			Case lt									
				outside = #False					
			Case gt
				outside = #True

		EndSelect  
		
		If outside = #True
			If *c\c ! gt
				result + Chr(*c\c)
			EndIf 
		EndIf 
	    *c + SizeOf(Character)							
	Wend
		
	ProcedureReturn result
	
EndProcedure 



Debug HTML2PlainText("<html><body background-color=#FFFFFF><h1>Welcome</h1> <p>To the jungle!</p> <p>of html Code</p></body></html>")
Debug HTML2PlainText("<h1>Welcome</h1> <p>To the jungle!</p> <p>of html Code</p>")
Post Reply