Read the contents of a Web Page

Just starting out? Need help? Post your questions and find answers here.
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

I am now on another computer.
With Windows 7 64-bit, PB 4.61 Beta 1, Firefox 11.0

Now, at first glance, it "seems" to work, because there is an output, BUT the contents is NOT of http://ip.xxoo.net/ !
Instead, it contains this:
Image
It is not an illusion, it contains "This program cannot display the webpage".
Here is the source code:

Code: Select all

    Procedure.s GetHtmlCode(URL.s)
        GhostWin=OpenWindow(#PB_Any,0,0,600,300,"",#PB_Window_Invisible)
        WebGad=WebGadget(#PB_Any,10,10,580,280,URL.s,#PB_Web_Mozilla)
        While WindowEvent():Wend
        While GetGadgetAttribute(WebGad,#PB_Web_Busy)<>0
            While WindowEvent():Wend
        Wend
        While WindowEvent():Wend
        WebPage.s=GetGadgetItemText(WebGad,#PB_Web_HtmlCode)   
        CloseWindow(GhostWin)
        ProcedureReturn WebPage.s
    EndProcedure


    C$=GetHtmlCode("http://ip.xxoo.net")
    DeleteFile("c:/temp/xxoo.htm")
    OpenFile(1,"c:/temp/xxoo.htm")
    WriteStringN(1,C$)
    CloseFile(1)
    
    
    If OpenConsole()
        PrintN(C$)
        PrintN(Str(Len(C$)))
        Input()
    EndIf
What now??? To me, #PB_Web_HtmlCode does NOT work.
If it worked for you, please check also the contents! ;)
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

Foz wrote:
What is wrong with using ReceiveHTTPFile(url, filename)?
Ok, let's try.

Code: Select all

  InitNetwork()
  
  Filename$="c:\temp\xxooinfo.htm"
  If ReceiveHTTPFile("http://ip.xxoo.net/", Filename$)
    Debug "Success"
  Else
    Debug "Failed"
  EndIf
Result:
Image
This works. Thanks Foz!
But, it is not 100% complete.
Nubcake wrote:
I've noticed GetGadgetItemText(#PB_Web_HtmlCode) doesn't return everything in the webgadget. Anyone care to explain why ?
Nubcake is correct (but here we used ReceiveHTTPFile()). After comparison, we can see that this portion is missing in the file between <script> and </head>: (compared with the source code obtained with Ctrl+U in the browser)

Code: Select all

<script type="text/javascript"></script><link rel='stylesheet' type='text/css' href='/B1D671CF-E532-4481-99AA-19F420D90332/netdefender/hui/ndhui.css' /><!--[if lt IE 8]><link rel='stylesheet' type='text/css' href='/B1D671CF-E532-4481-99AA-19F420D90332/netdefender/hui/ndhui_ie7.css' /><![endif]-->
:shock:
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Read the contents of a Web Page

Post by MachineCode »

charvista wrote:What now??? To me, #PB_Web_HtmlCode does NOT work.
If it worked for you, please check also the contents! ;)
Look at the contents I posted. Works fine here. Don't know why it's failing for you. Must be either a firewall or ISP issue.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

:) It is a real miracle that it works on your computer, dear MachineCode! :)
I have no clue what is happening, because xxoo is working fine when using Firefox or IE directly....
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
User avatar
Shardik
Addict
Addict
Posts: 2058
Joined: Thu Apr 21, 2005 2:38 pm
Location: Germany

Re: Read the contents of a Web Page

Post by Shardik »

I can confirm that charvista's code example from April 14th 3:33 pm
doesn't return any HTML code for the web site "http://ip.xxoo.net"
although "http://www.purebasic.com" works just fine (tested in
Windows XP SP3). But by selecting all the text on "ip.xxoo.net" and
then reading the selected text, it is at least possible to get that text
(for example to retrieve the displayed IP address): :wink:

Code: Select all

#OLECMDID_SELECTALL = 17
#OLECMDEXECOPT_DONTPROMPTUSER = 2

Procedure.s GetText(URL.s)
  GhostWin=OpenWindow(#PB_Any,0,0,600,300,"",#PB_Window_Invisible)
  WebGad=WebGadget(#PB_Any,10,10,580,280,URL.s)
  While GetGadgetAttribute(WebGad,#PB_Web_Busy)<>0
    WindowEvent()
  Wend
  
  WebObject.IWebBrowser2 = GetWindowLong_(GadgetID(WebGad), #GWL_USERDATA)
  WebObject\ExecWB(#OLECMDID_SELECTALL, #OLECMDEXECOPT_DONTPROMPTUSER, 0, 0) 
  
  Text$=GetGadgetItemText(WebGad,#PB_Web_SelectedText)   
  CloseWindow(GhostWin)
  ProcedureReturn Text$
EndProcedure

Debug GetText("http://ip.xxoo.net")
Foz
Addict
Addict
Posts: 1359
Joined: Tue Nov 13, 2007 12:42 pm
Location: Manchester, UK

Re: Read the contents of a Web Page

Post by Foz »

*Sigh*, I hadn't realised that the truncating issue hadn't been resolved yet with ReceiveHTTPFile().

Until then, use http://www.purebasic.fr/english/viewtopic.php?p=217199.
Post Reply