Download Website to String

Linux specific forum
DerProgrammierer78
New User
New User
Posts: 3
Joined: Thu Nov 11, 2010 6:41 pm

Download Website to String

Post by DerProgrammierer78 »

Hi ...

I want to download a Website to a String and parse it for links. The Problem is, that this should be OS- and Browser-independet.

Is there a way to write a code, that could be compiled and used in Linux, Windows and MacOS?

At home I just use Ubuntu, but at work I have to use Windows. So I want to create a code I can use on both systems.

And when its ready, other people in our company want to use it too. Three of them use MacOS.

So I need a function, that loads a website to a string on all three systems. The rest of the programm is not the problem. I just need this function.


Greetings from germany

Frank
User avatar
STARGÅTE
Addict
Addict
Posts: 2227
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Download Website to String

Post by STARGÅTE »

MyCode:

Code: Select all

Procedure.s ReceiveHTTPString(URL$, TimeOut=5000)
	Protected Event, Time, Size, String$, Inhalt
	Protected BufferSize = $1000, *Buffer = AllocateMemory(BufferSize)
	Protected ServerName$ = GetURLPart(URL$, #PB_URL_Site) 
	Protected ConnectionID = OpenNetworkConnection(ServerName$, 80) 
	If ConnectionID 
		SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#LFCR$+#LFCR$) 
		Time = ElapsedMilliseconds()
		Repeat 
			Delay(10)
			Event = NetworkClientEvent(ConnectionID)
			If Event = #PB_NetworkEvent_Data
				Repeat
					Size = ReceiveNetworkData(ConnectionID, *Buffer, BufferSize)
					String$ + PeekS(*Buffer, Size, #PB_Ascii) 
				Until Not Size
				Inhalt = FindString(String$, #LFCR$, 1)
				If Inhalt
					String$ = Mid(String$, Inhalt+3)
				EndIf
			EndIf   
		Until ElapsedMilliseconds()-Time > TimeOut Or String$
		CloseNetworkConnection(ConnectionID)
	EndIf 
	FreeMemory(*Buffer)
	ProcedureReturn String$
EndProcedure

InitNetwork()
Debug ReceiveHTTPString("http://data.unionbytes.de/ip.php")
Last edited by STARGÅTE on Sat Jan 08, 2011 6:08 pm, edited 1 time in total.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

Re: Download Website to String

Post by greyhoundcode »

I like that. X-platform and short, neat code.
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Re: Download Website to String

Post by PB »

Doesn't work with www.yahoo.com.
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
User avatar
STARGÅTE
Addict
Addict
Posts: 2227
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Download Website to String

Post by STARGÅTE »

Since there is a problem with the transmission,
please use direct links

Code: Select all

InitNetwork()
Debug ReceiveHTTPString("http://de.yahoo.com/")
or http://us.yahoo.com/
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

Re: Download Website to String

Post by greyhoundcode »

PB wrote:Doesn't work with http://www.yahoo.com.
Works for me - correctly returns a header with a 302 redirect, which reflects what happens when I visit that address in a browser - I end up at uk.yahoo.com (obviously, varies according to where in the world it thinks you are).

If you wanted your code to follow the redirect you'd obviously need to parse for the Location: 'xyz.com' instruction and follow appropriately.
DerProgrammierer78
New User
New User
Posts: 3
Joined: Thu Nov 11, 2010 6:41 pm

Re: Download Website to String

Post by DerProgrammierer78 »

Hi ...

Now i tested it on all three systems on 11 computers and have one last problem ...

When I try to load websites like this: http://www.mysmartphoneinfo.com/verizon ... rld/399884

Then I get no data, but headers like this:

HTTP/1.1 200 OK
Date: Sat, 27 Nov 2010 14:28:38 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback: http://www.mysmartphoneinfo.com/xmlrpc.php
Link: <http://www.mysmartphoneinfo.com/?p=399884>; rel=shortlink
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8

So what can I do to make my program loading such websites?
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

Re: Download Website to String

Post by greyhoundcode »

DerProgrammierer78 wrote:Now i tested it on all three systems on 11 computers and have one last problem ... When I try to load websites like this: http://www.mysmartphoneinfo.com/verizon ... rld/399884 Then I get no data
Strange one. These are the headers I get:

HTTP/1.1 301 Moved Permanently
Date: Sat, 27 Nov 2010 18:22:52 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback: http://www.mysmartphoneinfo.com/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Sat, 27 Nov 2010 18:22:52 GMT
Location: http://http/www.mysmartphoneinfo.com/ve ... rld/399884
Vary: Accept-Encoding
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8


Seems odd A) that it is redirecting, apparently to the very same URL and B) there seems to be an error in the formatting of the URL within the Location header.

Anyone else shed any light?
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

Re: Download Website to String

Post by greyhoundcode »

Using the HTTP stream wrapper in PHP and the fopen() command it loads the website right off the bat. I still get the same redirect headers coming back at me using STARGÅTE's code in PB though. Don't understand how a redirect to http://http/www.mysmartphoneinfo.com/ve ... rld/399884 works - that surely isn't a valid address!
Marlin
Enthusiast
Enthusiast
Posts: 406
Joined: Sun Sep 17, 2006 1:24 pm
Location: Germany

Re: Download Website to String

Post by Marlin »

I did not test or even look through your code thoroughly,
but this part looks wrong to me:
STARGÅTE wrote:

Code: Select all

...
SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#LFCR$+#LFCR$)
...
It should probably be:

SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#CRLF$+#CRLF$)

URL$ should be the absolute path and not the absoluteURI.
rfc1945 wrote:The absoluteURI form is only allowed when the request is being made
to a proxy.
See rfc1945.

Also, if you are addressing a server using virtual hosting (1 IP address mapping to multiple hostnames)
- I consider this to be normal at this time -
you need to provide a "Host: " header field!

Depending on the server, other header fields might be needed.
This could even include cookies.

When I addressed the problem of getting the source code of a website into a string,
I came to using wget. (I called it using RunProgram()...)

Wget is available for Windows, as well as for Linux.
(Probably also for Mac)
Wget can also follow redirects automatically and you can tell it the path to a cookies.txt file to use.
Post Reply