Page 1 of 1
text from browser window
Posted: Thu Dec 20, 2012 6:16 pm
by Tomio
hello,
I need to build a tool (windows 7) to extract the html code from a browsers window.
Till now I can locate the corresponding window by a keyword in the window's title.
But how do I find the html text inside? I got stuck completely.
And to my surprise I'm not able to find something in the Coding Questions.
tomio
Re: text from browser window
Posted: Thu Dec 20, 2012 6:44 pm
by luis
If the browser is IE, you could access remotely its IHTMLDocoment2 object.
The "difficult" thing is to obtain the address of that object.
See ->
http://support.microsoft.com/kb/249232
After you have done that, you can use the object methods to do almost anything.
Re: text from browser window
Posted: Thu Dec 20, 2012 7:21 pm
by netmaestro
If you use the webgadget (not mozilla) on Windows it's fairly simple:
Code: Select all
If OpenWindow(0, 0, 0, 600, 330, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
ButtonGadget(1, 250, 300,100,20,"copy html code")
WebGadget(0, 10, 10, 580, 280, "http://www.purebasic.com")
; Note: if you want to use a local file, change last parameter to "file://" + path + filename
Repeat
ev= WaitWindowEvent()
Select ev
Case #PB_Event_Gadget
If EventGadget() = 1
txt$ = GetGadgetItemText(0,#PB_Web_HtmlCode)
OpenWindow(1,0,0,640,480,"HTML Code:",#PB_Window_SystemMenu | #PB_Window_ScreenCentered)
EditorGadget(2,0,0,640,480,#PB_Editor_ReadOnly)
SetGadgetText(2, txt$)
Repeat:Until WaitWindowEvent()=#PB_Event_CloseWindow
CloseWindow(1)
EndIf
EndSelect
Until ev = #PB_Event_CloseWindow
EndIf
Alternatively you can pass #PB_Web_SelectedText to get just the current selection.
Re: text from browser window
Posted: Thu Dec 20, 2012 7:41 pm
by luis
But he said, "from a browser window". Moreover he said he "located the window", so I believe he meant an external browser.
From webgadget obviously is easy as you said.
Re: text from browser window
Posted: Thu Dec 20, 2012 7:51 pm
by IdeasVacuum
....Seems to me that both methods posted above need to know the web address, so if you can find that, having found the browser window, then the method of extracting the text is down to which best meets your requirement.
Re: text from browser window
Posted: Fri Dec 21, 2012 12:10 am
by Tomio
thanks.
I've read your answers and I have to think about it.
Yes, I can locate the window.
The problem is to read the html code.
Actually, it's not the code I'm interested in, but some short piece of the very large text-output.
tomio
Re: text from browser window
Posted: Fri Dec 21, 2012 12:28 am
by luis
Tomio wrote:
Actually, it's not the code I'm interested in, but some short piece of the very large text-output.
And that's different from the original question... since the output is not the html code but the final rendering of it.
So probably you don't need what outlined in the the previous answer (it can work too but maybe is overkill) and you can pull it off in a simpler way.
You could try sending a #WM_COPY message but I don't know if IE does honor the request.
Alternatively you could try SendInput() or keybd_event() to make the browser save the output text to file or to the clipboard with a CTRL + C.
Re: text from browser window
Posted: Fri Dec 21, 2012 5:17 pm
by Tomio
Sounds good.
The tool is to run as an office tool.
The user is a secretary (a lady).
If possible, the whole stuff should run in the background.
The tool is to be started once and check the available windows periodically for a special url.
If found it should grab the whole output (or part) somehow for further processing.
I have build several tools in the past concerning windows, like positioning+[no]header+etc.
Just design. But never send a key command.
SendInput() or keybd_event() to make the browser save the output text to file
Question: How do I send a command to the window like the one mentioned or whichever could help me??
tomio
Re: text from browser window
Posted: Fri Dec 21, 2012 5:49 pm
by luis
Based on what you are saying you don't need an external browser then.
1) If the thing must run in background probably it's better to create a self-included solution using the PB's webgadget. In that case you can find many examples on the forum on how to interact with it (search for IHTMLDocument2).
2) You could also simply download the data from the url, and than process the file to extract the data (maybe the simplest solution ?).
3) If you want to use an external browser for some reason anyway, you can also find sendinput() examples, just do a search. Basically you execute the target program and while it has the focus you start to synthesize keyboard and mouse input. But in your case I wouldn't follow this road.
Re: text from browser window
Posted: Fri Dec 21, 2012 7:50 pm
by Tomio
perhaps I don't understand your reply.
The secretary has several windos open. None, one or more can be IE browser, others can be Word or whatever.
When she decides to check that particular url (from outside, another office department), my tool shortly after will notice and grab the text output. The secretary does not have to take care of it. So what do you mean with external browser?
And I can't use a gadget. I don't manipulate/create a Window for her, but just want to read out the text from her browser whenever she clicks on that favorite url. In principle if things work out, she even doesn't need to know from the tool.
Your solution 2) is what we will do, when there is no other solution.
This would mean she always has to do a copy/paste. Depending on the situation could be 1-10 times a day. We would prefer to avoid this.
By the way, the extracted text is not only to be saved but will be manipulated in some complex way. This is what I want to do in PB anyway. So extract + process is what I would like to do in PB in one go.
tomio
Re: text from browser window
Posted: Fri Dec 21, 2012 8:48 pm
by luis
Tomio wrote:When she decides to check that particular url my tool shortly after will notice ...
So you need to monitor browsers windows and when a specific url is visited act upon that in some way.
And you cannot get any help from the user, for example by making her drag the url to your client.
Is that correct ? If it is, than THAT is the main hurdle.
You must admit it wasn't really evident from your original post, anyway I started to reply so I'll give it one more shot.
You cannot reliably poll all the windows of the browsers to check for a specific url and hope to catch it at the right time.
If the PC is connected to internet through a proxy maybe you can inspect the log and when a particular url is visited take from there.
You can also write your own simple proxy, run it on the secretary's PC and filter/examine all the traffic.
If you can retrieve that information from a similar source (some kind of log) probably it's the easiest way.
If not, you need some collaboration from the browser. Maybe a plugin or something similar.
For IE you can create a special DLL called Browser Helper Object (BHO). I'm not up to date with it so I don't know if it's still feasible.
See ->
http://msdn.microsoft.com/en-us/library ... 85%29.aspx
Maybe someone else have some other ideas.
Re: text from browser window
Posted: Fri Dec 21, 2012 9:23 pm
by ostapas
Just my 2 cents.
luis wrote:
If not, you need some collaboration from the browser. Maybe a plugin or something similar.
Have a look at
http://crossrider.com/ .Very useful framework for creating cross-browser extensions.
luis wrote:
You can also write your own simple proxy, run it on the secretary's PC and filter/examine all the traffic
I didn't try it practically, but, for making things simpler, what about setting up
Small HTTP server to act as a middleman between browser and internet(control/filter the links you need) and control it from PB via cmd line.
Re: text from browser window
Posted: Sat Dec 22, 2012 12:01 am
by Tomio
hm,
with this little code running in the background periodically, I can check for a keyword set by the called url with the <title> tag.
Code: Select all
Procedure FindPartWin(part$)
r=GetWindow_(GetDesktopWindow_(),#GW_CHILD)
Repeat
t$=Space(999) : GetWindowText_(r,t$,999)
If FindString(t$,part$,1)<>0
w=r
Else
r=GetWindow_(r,#GW_HWNDNEXT)
EndIf
Until r=0 Or w<>0
ProcedureReturn w
EndProcedure
Debug FindPartWin("A Keyword")
This works. So for me the problem is not to find the window but to extract the text output.
tomio
Re: text from browser window
Posted: Sat Dec 22, 2012 10:07 pm
by Nico
If you want to get the html page from internet explorer, luis gave you the solution, i use also.
See here:
http://www.purebasic.fr/english/viewtopic.php?t=24570
From Mozilla Firefox, I use MozRepl extension.
For other browsers it will be difficult!
Re: text from browser window
Posted: Sun Dec 23, 2012 7:12 pm
by Tomio
As I said, I can locate the page in question.
And save the text I had selected (for further processing).
Still I'm busy with selecting the text by the tool itself.
But I'm confident to solve this soon.
Thanks for any help so far
tomio