Page 2 of 2

Re: Read the contents of a Web Page

Posted: Mon Apr 16, 2012 7:26 pm
by charvista
I am now on another computer.
With Windows 7 64-bit, PB 4.61 Beta 1, Firefox 11.0

Now, at first glance, it "seems" to work, because there is an output, BUT the contents is NOT of http://ip.xxoo.net/ !
Instead, it contains this:
Image
It is not an illusion, it contains "This program cannot display the webpage".
Here is the source code:

Code: Select all

    Procedure.s GetHtmlCode(URL.s)
        GhostWin=OpenWindow(#PB_Any,0,0,600,300,"",#PB_Window_Invisible)
        WebGad=WebGadget(#PB_Any,10,10,580,280,URL.s,#PB_Web_Mozilla)
        While WindowEvent():Wend
        While GetGadgetAttribute(WebGad,#PB_Web_Busy)<>0
            While WindowEvent():Wend
        Wend
        While WindowEvent():Wend
        WebPage.s=GetGadgetItemText(WebGad,#PB_Web_HtmlCode)   
        CloseWindow(GhostWin)
        ProcedureReturn WebPage.s
    EndProcedure


    C$=GetHtmlCode("http://ip.xxoo.net")
    DeleteFile("c:/temp/xxoo.htm")
    OpenFile(1,"c:/temp/xxoo.htm")
    WriteStringN(1,C$)
    CloseFile(1)
    
    
    If OpenConsole()
        PrintN(C$)
        PrintN(Str(Len(C$)))
        Input()
    EndIf
What now??? To me, #PB_Web_HtmlCode does NOT work.
If it worked for you, please check also the contents! ;)

Re: Read the contents of a Web Page

Posted: Mon Apr 16, 2012 7:57 pm
by charvista
Foz wrote:
What is wrong with using ReceiveHTTPFile(url, filename)?
Ok, let's try.

Code: Select all

  InitNetwork()
  
  Filename$="c:\temp\xxooinfo.htm"
  If ReceiveHTTPFile("http://ip.xxoo.net/", Filename$)
    Debug "Success"
  Else
    Debug "Failed"
  EndIf
Result:
Image
This works. Thanks Foz!
But, it is not 100% complete.
Nubcake wrote:
I've noticed GetGadgetItemText(#PB_Web_HtmlCode) doesn't return everything in the webgadget. Anyone care to explain why ?
Nubcake is correct (but here we used ReceiveHTTPFile()). After comparison, we can see that this portion is missing in the file between <script> and </head>: (compared with the source code obtained with Ctrl+U in the browser)

Code: Select all

<script type="text/javascript"></script><link rel='stylesheet' type='text/css' href='/B1D671CF-E532-4481-99AA-19F420D90332/netdefender/hui/ndhui.css' /><!--[if lt IE 8]><link rel='stylesheet' type='text/css' href='/B1D671CF-E532-4481-99AA-19F420D90332/netdefender/hui/ndhui_ie7.css' /><![endif]-->
:shock:

Re: Read the contents of a Web Page

Posted: Tue Apr 17, 2012 3:40 am
by MachineCode
charvista wrote:What now??? To me, #PB_Web_HtmlCode does NOT work.
If it worked for you, please check also the contents! ;)
Look at the contents I posted. Works fine here. Don't know why it's failing for you. Must be either a firewall or ISP issue.

Re: Read the contents of a Web Page

Posted: Tue Apr 17, 2012 6:48 am
by charvista
:) It is a real miracle that it works on your computer, dear MachineCode! :)
I have no clue what is happening, because xxoo is working fine when using Firefox or IE directly....

Re: Read the contents of a Web Page

Posted: Tue Apr 17, 2012 8:13 am
by Shardik
I can confirm that charvista's code example from April 14th 3:33 pm
doesn't return any HTML code for the web site "http://ip.xxoo.net"
although "http://www.purebasic.com" works just fine (tested in
Windows XP SP3). But by selecting all the text on "ip.xxoo.net" and
then reading the selected text, it is at least possible to get that text
(for example to retrieve the displayed IP address): :wink:

Code: Select all

#OLECMDID_SELECTALL = 17
#OLECMDEXECOPT_DONTPROMPTUSER = 2

Procedure.s GetText(URL.s)
  GhostWin=OpenWindow(#PB_Any,0,0,600,300,"",#PB_Window_Invisible)
  WebGad=WebGadget(#PB_Any,10,10,580,280,URL.s)
  While GetGadgetAttribute(WebGad,#PB_Web_Busy)<>0
    WindowEvent()
  Wend
  
  WebObject.IWebBrowser2 = GetWindowLong_(GadgetID(WebGad), #GWL_USERDATA)
  WebObject\ExecWB(#OLECMDID_SELECTALL, #OLECMDEXECOPT_DONTPROMPTUSER, 0, 0) 
  
  Text$=GetGadgetItemText(WebGad,#PB_Web_SelectedText)   
  CloseWindow(GhostWin)
  ProcedureReturn Text$
EndProcedure

Debug GetText("http://ip.xxoo.net")

Re: Read the contents of a Web Page

Posted: Tue Apr 17, 2012 9:27 am
by Foz
*Sigh*, I hadn't realised that the truncating issue hadn't been resolved yet with ReceiveHTTPFile().

Until then, use http://www.purebasic.fr/english/viewtopic.php?p=217199.