Read the contents of a Web Page

Just starting out? Need help? Post your questions and find answers here.
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Read the contents of a Web Page

Post by charvista »

I was trying to load an existing webpage in a variable, but no luck.

The manuals says under WebGadget():
- GetGadgetItemText(): The following constants can be used to get information (Windows only):
#PB_Web_HtmlCode : Get the html code from the gadget.

Plus:
Note: The following features do not work with the Mozilla ActiveX on windows (#PB_Web_Mozilla flag)

So, I have two questions:
1. How to get the the content of a webpage? (syntax...)
2. How to know which explorer is used? (IE, Firefox, Safari,...)

So far, I have:

Code: Select all

OpenWindow(0, 0, 0, 600, 300, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
WebGadget(0, 10, 10, 580, 280, "http://www.purebasic.com")
WebPage.s=GetGadgetItemText(0,#PB_Web_HtmlCode|#PB_Web_Mozilla)
Debug WebPage

Repeat
Until WaitWindowEvent() = #PB_Event_CloseWindow
The Debug does not return what I expected. I am using Firefox on Windows 7.
Thanks for any suggestion :)
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

I already found out that the flag #PB_Web_Mozilla has to be used in the WebGadget() function, not in GetGadgetItemText() !
But still no luck.....

Code: Select all

OpenWindow(0, 0, 0, 600, 300, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
WebGadget(0, 10, 10, 580, 280, "http://www.purebasic.com",#PB_Web_Mozilla)
WebPage.s=GetGadgetItemText(0,#PB_Web_HtmlCode)
Debug WebPage
Debug Len(WebPage)

Repeat
Until WaitWindowEvent() = #PB_Event_CloseWindow
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
User avatar
Kiffi
Addict
Addict
Posts: 1484
Joined: Tue Mar 02, 2004 1:20 pm
Location: Amphibios 9

Re: Read the contents of a Web Page

Post by Kiffi »

you have to wait until the page is loaded completely

Code: Select all

WebGadget(0, 10, 10, 580, 280, "http://www.purebasic.com")

While GetGadgetAttribute(0, #PB_Web_Busy) <> 0
  WindowEvent()
Wend

WebPage.s=GetGadgetItemText(0,#PB_Web_HtmlCode)
Greetings ... Kiffi
Hygge
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

Indeed Kiffi ! You are right that the page was not yet downloaded completely, so the information could not be retrieved. :oops:
I felt that it would be highly logical that GetGadgetItemText had an embedded waiting-until-ready function... hence I did not think about that!
Thank you Kiffi !!! :D
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

Ok, now that it works, let me share with you the procedure I was busy to write.
The procedure simply gets the webpage, in a transparent way.

Code: Select all

Procedure.s GetHtmlCode(URL.s)
    GhostWin=OpenWindow(#PB_Any,0,0,600,300,"",#PB_Window_Invisible)
    WebGad=WebGadget(#PB_Any,10,10,580,280,URL.s)
    While GetGadgetAttribute(WebGad,#PB_Web_Busy)<>0
        WindowEvent()
    Wend
    WebPage.s=GetGadgetItemText(WebGad,#PB_Web_HtmlCode)   
    CloseWindow(GhostWin)
    ProcedureReturn WebPage.s
EndProcedure


Debug GetHtmlCode("http://www.purebasic.com")
Have fun! :)
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

Hmm, it works very well, but not with all webpages, why not?
Please try with: http://ip.xxoo.net/

Cheers
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Read the contents of a Web Page

Post by MachineCode »

charvista wrote:Please try with: http://ip.xxoo.net/
Works here.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

charvista wrote: Please try with: http://ip.xxoo.net/

MachineCode wrote: Works here.
I retried to check again... and still no luck with ip.xxoo.net (among some others as well).
With *and* without the flag #PB_Web_Mozilla in the WebGadget().
MachineCode, are you using Mozilla Firefox as well?

My testcomputer: Windows 7 32-bit, PB 4.60, Firefox 11.0

Code: Select all

    Procedure.s GetHtmlCode(URL.s)
        GhostWin=OpenWindow(#PB_Any,0,0,600,300,"",#PB_Window_Invisible)
        WebGad=WebGadget(#PB_Any,10,10,580,280,URL.s,#PB_Web_Mozilla)
        While GetGadgetAttribute(WebGad,#PB_Web_Busy)<>0
            WindowEvent()
        Wend
        WebPage.s=GetGadgetItemText(WebGad,#PB_Web_HtmlCode)   
        CloseWindow(GhostWin)
        ProcedureReturn WebPage.s
    EndProcedure


    C$=GetHtmlCode("http://ip.xxoo.net")
    
    Debug C$
    Debug Len(C$)
still returns a blanco line, LEN = 0.
Kiffi's addition "wait-until-ready" is checking on #PB_Web_Busy, so I don't see what I am missing here, because it seems to work on MachineCode's computer....
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Read the contents of a Web Page

Post by MachineCode »

charvista wrote:MachineCode, are you using Mozilla Firefox as well?
I just used the code snippet from the post by you at 3:33 pm. Here's a copy of the "Debug Output" window:


Image

Code: Select all

<html>
<head>
<meta name="viewport" content="initial-scale=1.0, user-scalable=no" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="keywords" content="IP,NSLOOKUP,IP Address,IP City,IP Country" />
<meta name="description" content="IP Information" />
<meta name="robots" content="all">
<meta name="programmed" content="C.K. Yang" />
<meta name="copyright" content="C.K. Yang" />
<title>IP Information</title>
<style type="text/css">
<!--
body,td,th {
	font-family: Verdana, Arial, Georgia, 微軟正黑é«", sans-serif;
	FONT-SIZE: 13px;
}
input, textarea, select, button {
	FONT-SIZE: 13px;
	font-family: Verdana, Arial, Georgia, 微軟正黑é«", sans-serif;
}
.table {
	border-top: thin solid #CCCCCC;
	border-right: thick solid #CCCCCC;
	border-bottom: thick solid #CCCCCC;
	border-left: thin solid #CCCCCC;
}
a:link {
	color: #003247;
}
a:visited {
	color: #003247;
}
a:hover {
	color: #10212C;
}
a:active {
	color: #10212C;
}
-->
</style>

<script type="text/javascript" src="http://maps.google.com/maps/api/js?sensor=false"></script>
<script type="text/javascript">
var map;
function initialize() {
	var myLatlng = new google.maps.LatLng(-27, 133);
    var myOptions = {
		zoom: 4,
		center: myLatlng,
		mapTypeId: google.maps.MapTypeId.ROADMAP
	}
	map = new google.maps.Map(document.getElementById("gMap"), myOptions);
	var infowindow = new google.maps.InfoWindow({ 
		content: '<font size="1"><B>Australia</B><BR><BR>123.200.192.77</font>'
    });
	var marker = new google.maps.Marker({
		position: myLatlng,
		map: map,
		title:"Australia"
	});
	google.maps.event.addListener(marker, 'click', function() {
		infowindow.open(map,marker);
	});
}
</script>

<script type="text/javascript">
window.google_analytics_uacct = "UA-359219-7";
</script>
<script type="text/javascript">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-359219-7']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>
</head>

<body onload="initialize()">

<table width="100%"  border="0" cellpadding="0" cellspacing="0">
  <tr>
    <td align="center" valign="middle">

<table width="900" border="0" cellpadding="5" cellspacing="1" bgcolor="#ffffff">
  <tr height="30">
	<td bgcolor="#ffffff" align="left"><div align="left"><img src="./images/flags/us.png" align="absmiddle" border="0">&nbsp;<a href="?L=en">English</a> | <img src="./images/flags/tw.png" align="absmiddle" border="0">&nbsp;<a href="?L=tw">æ­£é«"中文</a> | <img src="./images/flags/cn.png" align="absmiddle" border="0">&nbsp;<a href="?L=cn">简ä½"中文</a></div></td>
	<td width="500" bgcolor="#ffffff" align="right"><div align="right">
	<!-- AddThis Button BEGIN -->
	<div class="addthis_toolbox addthis_default_style " addthis:url="http://ip.xxoo.net">
	<a class="addthis_button_facebook_like" fb:like:layout="button_count"></a>
	<a class="addthis_button_tweet"></a>
	<a class="addthis_button_google_plusone" g:plusone:size="medium"></a>
	<a class="addthis_counter addthis_pill_style"></a>
	</div>
	<script type="text/javascript">var addthis_config = {"data_track_addressbar":true};</script>
	<script type="text/javascript" src="http://s7.addthis.com/js/300/addthis_widget.js#pubid=chikaeyang"></script>
	<!-- AddThis Button END -->

	</div></td>
  </tr>
</table>

<table width="900" border="0" cellpadding="5" cellspacing="1" bgcolor="#cccccc">
  <tr height="50"><form method="POST" action="">
    <td width="40%" bgcolor="#ffffff"><div align="right"><B>Search:</B></div></td>
	<td width="60%" bgcolor="#ffffff"><div align="left"><input type="text" name="ip" size="20" value=123.200.192.77>&nbsp;&nbsp;<input type="submit" name="Mode" value="Go"></div></td></form>
  </tr>
  <tr height="30">
    <td bgcolor="#ffffff"><div align="right"><B>IP Address:</B></div></td>
    <td bgcolor="# [...]
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
Danilo
Addict
Addict
Posts: 3036
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Re: Read the contents of a Web Page

Post by Danilo »

Works here, too.

Maybe insert "While WindowEvent():Wend" to make sure all events are processed.

Code: Select all

Procedure.s GetHtmlCode(URL.s)
    GhostWin=OpenWindow(#PB_Any,0,0,600,300,"",#PB_Window_Invisible)
    WebGad=WebGadget(#PB_Any,10,10,580,280,URL.s,#PB_Web_Mozilla)
    While WindowEvent():Wend
    While GetGadgetAttribute(WebGad,#PB_Web_Busy)<>0
        While WindowEvent():Wend
    Wend
    While WindowEvent():Wend
    WebPage.s=GetGadgetItemText(WebGad,#PB_Web_HtmlCode)   
    CloseWindow(GhostWin)
    ProcedureReturn WebPage.s
EndProcedure


C$=GetHtmlCode("http://ip.xxoo.net")

If OpenConsole()
    PrintN(C$)
    PrintN(Str(Len(C$)))
    Input()
EndIf
Nubcake
Enthusiast
Enthusiast
Posts: 195
Joined: Thu Feb 03, 2011 7:44 pm

Re: Read the contents of a Web Page

Post by Nubcake »

I've noticed GetGadgetItemText(#PB_Web_HtmlCode) doesn't return everything in the webgadget. Anyone care to explain why ?
User avatar
charvista
Addict
Addict
Posts: 949
Joined: Tue Sep 23, 2008 11:38 pm
Location: Belgium

Re: Read the contents of a Web Page

Post by charvista »

@MachineCode
Yes, that is what is supposed to obtain.

@Danilo
I tried with your modified code. Still no luck, see picture.
Image

@Nubcake
Correct, the contents of the page that MachineCode pasted is not complete. But if it was copied from the Debug window, then it is normal, because the debug window cuts the very long lines.


The problem lies in PureBasic, because a function from another language which does exact the same, is working well... :evil:
- Windows 11 Home 64-bit
- PureBasic 6.10 LTS (x64)
- 64 Gb RAM
- 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
- 5K monitor with DPI @ 200%
Nubcake
Enthusiast
Enthusiast
Posts: 195
Joined: Thu Feb 03, 2011 7:44 pm

Re: Read the contents of a Web Page

Post by Nubcake »

@Nubcake
The problem lies in PureBasic, because a function from another language which does exact the same, is working well... :evil:
Will anyone see to this issue if it is one ? Anyway I was searching and I found a very useful thread which returns the exact HTML code of the WebGadget instead of having things changed with GetGadgetItemText() :D

http://www.purebasic.fr/english/viewtop ... +html+code
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Read the contents of a Web Page

Post by MachineCode »

charvista wrote:The problem lies in PureBasic
What problem? It works fine for two of us.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
Foz
Addict
Addict
Posts: 1359
Joined: Tue Nov 13, 2007 12:42 pm
Location: Manchester, UK

Re: Read the contents of a Web Page

Post by Foz »

What is wrong with using ReceiveHTTPFile(url, filename)?
Post Reply