Page 1 of 1

#PB_Web_PlainText for GetGadgetItemText()

Posted: Thu May 26, 2011 12:58 pm
by MachineCode
Can we have a #PB_Web_PlainText flag for GetGadgetItemText() to return the plain text of the web page? We've got a #PB_Web_HtmlCode so a plain text alternative would be great. And yes, I know we can use #PB_Web_SelectedText but that's different and too cumbersome, as we need to highlight all the text first and that's annoying for the user.

Re: #PB_Web_PlainText for GetGadgetItemText()

Posted: Thu May 26, 2011 1:26 pm
by TomS
I know it's a feature request but here's a possible solution.

There's no formatting at all in this example (not even <br> which would be easy to add).
But when it comes to Div's and Tables you're lost.

This code would also return any JS-Code there is in the body (but only if it's not encapsulated by CDATA).

Code: Select all

<body><script type="javascript">document.write('hello world<br>');</script>
How are you?</body>
Would indeed return "document.write('hello world'); How are you?"

Code: Select all

<body><script type="javascript">
<![CDATA[
	 document.write('hello world<br>');
]]>
</script>
How are you?</body>
Would return "How are you?"

Code: Select all

Procedure.s HTML2PlainText(input.s)
	Protected workString.s = Mid(input, FindString(input, "<body ", 1)) ;Start at the body-tag thus ignore all the styles and JS in the head-tag.
		
	Protected *c.Character 		= @workString
	Protected outside.i 			= #True
	Protected lt.i 				= Asc("<")
	Protected gt.i 				= Asc(">")
	
	Protected result.s
	
	While *c\c ! 0										
		Select *c\c 
			Case lt									
				outside = #False					
			Case gt
				outside = #True

		EndSelect  
		
		If outside = #True
			If *c\c ! gt
				result + Chr(*c\c)
			EndIf 
		EndIf 
	    *c + SizeOf(Character)							
	Wend
		
	ProcedureReturn result
	
EndProcedure 



Debug HTML2PlainText("<html><body background-color=#FFFFFF><h1>Welcome</h1> <p>To the jungle!</p> <p>of html Code</p></body></html>")
Debug HTML2PlainText("<h1>Welcome</h1> <p>To the jungle!</p> <p>of html Code</p>")