text from browser window

Just starting out? Need help? Post your questions and find answers here.
Tomio
Enthusiast
Enthusiast
Posts: 291
Joined: Sun Apr 27, 2003 4:54 pm
Location: Germany

text from browser window

Post by Tomio »

hello,
I need to build a tool (windows 7) to extract the html code from a browsers window.
Till now I can locate the corresponding window by a keyword in the window's title.

But how do I find the html text inside? I got stuck completely.
And to my surprise I'm not able to find something in the Coding Questions.
tomio
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: text from browser window

Post by luis »

If the browser is IE, you could access remotely its IHTMLDocoment2 object.

The "difficult" thing is to obtain the address of that object.

See -> http://support.microsoft.com/kb/249232

After you have done that, you can use the object methods to do almost anything.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: text from browser window

Post by netmaestro »

If you use the webgadget (not mozilla) on Windows it's fairly simple:

Code: Select all

If OpenWindow(0, 0, 0, 600, 330, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
  ButtonGadget(1, 250, 300,100,20,"copy html code")
  WebGadget(0, 10, 10, 580, 280, "http://www.purebasic.com")
  ; Note: if you want to use a local file, change last parameter to "file://" + path + filename
  Repeat
    ev= WaitWindowEvent()
    Select ev
      Case #PB_Event_Gadget
        If EventGadget() = 1
          txt$ = GetGadgetItemText(0,#PB_Web_HtmlCode)
          OpenWindow(1,0,0,640,480,"HTML Code:",#PB_Window_SystemMenu | #PB_Window_ScreenCentered)
          EditorGadget(2,0,0,640,480,#PB_Editor_ReadOnly)
          SetGadgetText(2, txt$)
          Repeat:Until WaitWindowEvent()=#PB_Event_CloseWindow
          CloseWindow(1)
        EndIf
    EndSelect
  Until ev = #PB_Event_CloseWindow
EndIf
Alternatively you can pass #PB_Web_SelectedText to get just the current selection.
BERESHEIT
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: text from browser window

Post by luis »

But he said, "from a browser window". Moreover he said he "located the window", so I believe he meant an external browser.

From webgadget obviously is easy as you said.
"Have you tried turning it off and on again ?"
A little PureBasic review
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: text from browser window

Post by IdeasVacuum »

....Seems to me that both methods posted above need to know the web address, so if you can find that, having found the browser window, then the method of extracting the text is down to which best meets your requirement.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
Tomio
Enthusiast
Enthusiast
Posts: 291
Joined: Sun Apr 27, 2003 4:54 pm
Location: Germany

Re: text from browser window

Post by Tomio »

thanks.
I've read your answers and I have to think about it.
Yes, I can locate the window.
The problem is to read the html code.
Actually, it's not the code I'm interested in, but some short piece of the very large text-output.

tomio
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: text from browser window

Post by luis »

Tomio wrote: Actually, it's not the code I'm interested in, but some short piece of the very large text-output.
And that's different from the original question... since the output is not the html code but the final rendering of it.

So probably you don't need what outlined in the the previous answer (it can work too but maybe is overkill) and you can pull it off in a simpler way.

You could try sending a #WM_COPY message but I don't know if IE does honor the request.

Alternatively you could try SendInput() or keybd_event() to make the browser save the output text to file or to the clipboard with a CTRL + C.
"Have you tried turning it off and on again ?"
A little PureBasic review
Tomio
Enthusiast
Enthusiast
Posts: 291
Joined: Sun Apr 27, 2003 4:54 pm
Location: Germany

Re: text from browser window

Post by Tomio »

Sounds good.

The tool is to run as an office tool.
The user is a secretary (a lady).
If possible, the whole stuff should run in the background.
The tool is to be started once and check the available windows periodically for a special url.
If found it should grab the whole output (or part) somehow for further processing.

I have build several tools in the past concerning windows, like positioning+[no]header+etc.
Just design. But never send a key command.
SendInput() or keybd_event() to make the browser save the output text to file
Question: How do I send a command to the window like the one mentioned or whichever could help me??

tomio
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: text from browser window

Post by luis »

Based on what you are saying you don't need an external browser then.

1) If the thing must run in background probably it's better to create a self-included solution using the PB's webgadget. In that case you can find many examples on the forum on how to interact with it (search for IHTMLDocument2).

2) You could also simply download the data from the url, and than process the file to extract the data (maybe the simplest solution ?).

3) If you want to use an external browser for some reason anyway, you can also find sendinput() examples, just do a search. Basically you execute the target program and while it has the focus you start to synthesize keyboard and mouse input. But in your case I wouldn't follow this road.
"Have you tried turning it off and on again ?"
A little PureBasic review
Tomio
Enthusiast
Enthusiast
Posts: 291
Joined: Sun Apr 27, 2003 4:54 pm
Location: Germany

Re: text from browser window

Post by Tomio »

perhaps I don't understand your reply.

The secretary has several windos open. None, one or more can be IE browser, others can be Word or whatever.
When she decides to check that particular url (from outside, another office department), my tool shortly after will notice and grab the text output. The secretary does not have to take care of it. So what do you mean with external browser?

And I can't use a gadget. I don't manipulate/create a Window for her, but just want to read out the text from her browser whenever she clicks on that favorite url. In principle if things work out, she even doesn't need to know from the tool.

Your solution 2) is what we will do, when there is no other solution.
This would mean she always has to do a copy/paste. Depending on the situation could be 1-10 times a day. We would prefer to avoid this.

By the way, the extracted text is not only to be saved but will be manipulated in some complex way. This is what I want to do in PB anyway. So extract + process is what I would like to do in PB in one go.

tomio
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: text from browser window

Post by luis »

Tomio wrote:When she decides to check that particular url my tool shortly after will notice ...
So you need to monitor browsers windows and when a specific url is visited act upon that in some way.
And you cannot get any help from the user, for example by making her drag the url to your client.

Is that correct ? If it is, than THAT is the main hurdle.
You must admit it wasn't really evident from your original post, anyway I started to reply so I'll give it one more shot.

You cannot reliably poll all the windows of the browsers to check for a specific url and hope to catch it at the right time.

If the PC is connected to internet through a proxy maybe you can inspect the log and when a particular url is visited take from there.

You can also write your own simple proxy, run it on the secretary's PC and filter/examine all the traffic.

If you can retrieve that information from a similar source (some kind of log) probably it's the easiest way.

If not, you need some collaboration from the browser. Maybe a plugin or something similar.

For IE you can create a special DLL called Browser Helper Object (BHO). I'm not up to date with it so I don't know if it's still feasible.

See -> http://msdn.microsoft.com/en-us/library ... 85%29.aspx

Maybe someone else have some other ideas.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
ostapas
Enthusiast
Enthusiast
Posts: 192
Joined: Thu Feb 18, 2010 11:10 pm

Re: text from browser window

Post by ostapas »

Just my 2 cents.
luis wrote:
If not, you need some collaboration from the browser. Maybe a plugin or something similar.
Have a look at http://crossrider.com/ .Very useful framework for creating cross-browser extensions.
luis wrote:
You can also write your own simple proxy, run it on the secretary's PC and filter/examine all the traffic
I didn't try it practically, but, for making things simpler, what about setting up Small HTTP server to act as a middleman between browser and internet(control/filter the links you need) and control it from PB via cmd line.
Tomio
Enthusiast
Enthusiast
Posts: 291
Joined: Sun Apr 27, 2003 4:54 pm
Location: Germany

Re: text from browser window

Post by Tomio »

hm,
with this little code running in the background periodically, I can check for a keyword set by the called url with the <title> tag.

Code: Select all

Procedure FindPartWin(part$)
  r=GetWindow_(GetDesktopWindow_(),#GW_CHILD)
  Repeat
    t$=Space(999) : GetWindowText_(r,t$,999)
    If FindString(t$,part$,1)<>0
      w=r
    Else
      r=GetWindow_(r,#GW_HWNDNEXT)
    EndIf
  Until r=0 Or w<>0
  ProcedureReturn w
EndProcedure

Debug FindPartWin("A Keyword")
This works. So for me the problem is not to find the window but to extract the text output.

tomio
Nico
Enthusiast
Enthusiast
Posts: 274
Joined: Sun Jan 11, 2004 11:34 am
Location: France

Re: text from browser window

Post by Nico »

If you want to get the html page from internet explorer, luis gave you the solution, i use also.
See here: http://www.purebasic.fr/english/viewtopic.php?t=24570

From Mozilla Firefox, I use MozRepl extension.

For other browsers it will be difficult!
Tomio
Enthusiast
Enthusiast
Posts: 291
Joined: Sun Apr 27, 2003 4:54 pm
Location: Germany

Re: text from browser window

Post by Tomio »

As I said, I can locate the page in question.
And save the text I had selected (for further processing).
Still I'm busy with selecting the text by the tool itself.
But I'm confident to solve this soon.

Thanks for any help so far
tomio
Post Reply