Page 1 of 1
Download Website to String
Posted: Thu Nov 11, 2010 7:11 pm
by DerProgrammierer78
Hi ...
I want to download a Website to a String and parse it for links. The Problem is, that this should be OS- and Browser-independet.
Is there a way to write a code, that could be compiled and used in Linux, Windows and MacOS?
At home I just use Ubuntu, but at work I have to use Windows. So I want to create a code I can use on both systems.
And when its ready, other people in our company want to use it too. Three of them use MacOS.
So I need a function, that loads a website to a string on all three systems. The rest of the programm is not the problem. I just need this function.
Greetings from germany
Frank
Re: Download Website to String
Posted: Thu Nov 11, 2010 8:43 pm
by STARGÅTE
MyCode:
Code: Select all
Procedure.s ReceiveHTTPString(URL$, TimeOut=5000)
Protected Event, Time, Size, String$, Inhalt
Protected BufferSize = $1000, *Buffer = AllocateMemory(BufferSize)
Protected ServerName$ = GetURLPart(URL$, #PB_URL_Site)
Protected ConnectionID = OpenNetworkConnection(ServerName$, 80)
If ConnectionID
SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#LFCR$+#LFCR$)
Time = ElapsedMilliseconds()
Repeat
Delay(10)
Event = NetworkClientEvent(ConnectionID)
If Event = #PB_NetworkEvent_Data
Repeat
Size = ReceiveNetworkData(ConnectionID, *Buffer, BufferSize)
String$ + PeekS(*Buffer, Size, #PB_Ascii)
Until Not Size
Inhalt = FindString(String$, #LFCR$, 1)
If Inhalt
String$ = Mid(String$, Inhalt+3)
EndIf
EndIf
Until ElapsedMilliseconds()-Time > TimeOut Or String$
CloseNetworkConnection(ConnectionID)
EndIf
FreeMemory(*Buffer)
ProcedureReturn String$
EndProcedure
InitNetwork()
Debug ReceiveHTTPString("http://data.unionbytes.de/ip.php")
Re: Download Website to String
Posted: Sat Nov 20, 2010 3:11 pm
by greyhoundcode
I like that. X-platform and short, neat code.
Re: Download Website to String
Posted: Sat Nov 20, 2010 3:29 pm
by PB
Doesn't work with
www.yahoo.com.
Re: Download Website to String
Posted: Mon Nov 22, 2010 1:45 am
by STARGÅTE
Since there is a problem with the transmission,
please use direct links
Code: Select all
InitNetwork()
Debug ReceiveHTTPString("http://de.yahoo.com/")
or
http://us.yahoo.com/
Re: Download Website to String
Posted: Mon Nov 22, 2010 7:54 pm
by greyhoundcode
Works for me - correctly returns a header with a 302 redirect, which reflects what happens when I visit that address in a browser - I end up at uk.yahoo.com (obviously, varies according to where in the world it thinks you are).
If you wanted your code to follow the redirect you'd obviously need to parse for the Location: 'xyz.com' instruction and follow appropriately.
Re: Download Website to String
Posted: Sat Nov 27, 2010 6:10 pm
by DerProgrammierer78
Hi ...
Now i tested it on all three systems on 11 computers and have one last problem ...
When I try to load websites like this:
http://www.mysmartphoneinfo.com/verizon ... rld/399884
Then I get no data, but headers like this:
HTTP/1.1 200 OK
Date: Sat, 27 Nov 2010 14:28:38 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback:
http://www.mysmartphoneinfo.com/xmlrpc.php
Link: <
http://www.mysmartphoneinfo.com/?p=399884>; rel=shortlink
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8
So what can I do to make my program loading such websites?
Re: Download Website to String
Posted: Sat Nov 27, 2010 7:27 pm
by greyhoundcode
Strange one. These are the headers I get:
HTTP/1.1 301 Moved Permanently
Date: Sat, 27 Nov 2010 18:22:52 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback: http://www.mysmartphoneinfo.com/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Sat, 27 Nov 2010 18:22:52 GMT
Location: http://http/www.mysmartphoneinfo.com/ve ... rld/399884
Vary: Accept-Encoding
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8
Seems odd A) that it is redirecting, apparently to the very same URL and B) there seems to be an error in the formatting of the URL within the Location header.
Anyone else shed any light?
Re: Download Website to String
Posted: Mon Nov 29, 2010 6:47 pm
by greyhoundcode
Using the HTTP stream wrapper in PHP and the
fopen() command it loads the website right off the bat. I still get the same redirect headers coming back at me using STARGÅTE's code in PB though. Don't understand how a redirect to
http://http/www.mysmartphoneinfo.com/ve ... rld/399884 works - that surely isn't a valid address!
Re: Download Website to String
Posted: Sun Jan 09, 2011 5:52 pm
by Marlin
I did not test or even look through your code thoroughly,
but this part looks wrong to me:
STARGÅTE wrote:Code: Select all
...
SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#LFCR$+#LFCR$)
...
It should probably be:
SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+
#CRLF$+#CRLF$)
URL$ should be the absolute path and not the absoluteURI.
rfc1945 wrote:The absoluteURI form is only allowed when the request is being made
to a proxy.
See
rfc1945.
Also, if you are addressing a server using virtual hosting (1 IP address mapping to multiple hostnames)
- I consider this to be normal at this time -
you need to provide a "Host: " header field!
Depending on the server, other header fields might be needed.
This could even include cookies.
When I addressed the problem of getting the source code of a website into a string,
I came to using
wget. (I called it using RunProgram()...)
Wget is available for Windows, as well as for Linux.
(Probably also for Mac)
Wget can also follow redirects automatically and you can tell it the path to a cookies.txt file to use.