Download Website to String
-
- New User
- Posts: 3
- Joined: Thu Nov 11, 2010 6:41 pm
Download Website to String
Hi ...
I want to download a Website to a String and parse it for links. The Problem is, that this should be OS- and Browser-independet.
Is there a way to write a code, that could be compiled and used in Linux, Windows and MacOS?
At home I just use Ubuntu, but at work I have to use Windows. So I want to create a code I can use on both systems.
And when its ready, other people in our company want to use it too. Three of them use MacOS.
So I need a function, that loads a website to a string on all three systems. The rest of the programm is not the problem. I just need this function.
Greetings from germany
Frank
I want to download a Website to a String and parse it for links. The Problem is, that this should be OS- and Browser-independet.
Is there a way to write a code, that could be compiled and used in Linux, Windows and MacOS?
At home I just use Ubuntu, but at work I have to use Windows. So I want to create a code I can use on both systems.
And when its ready, other people in our company want to use it too. Three of them use MacOS.
So I need a function, that loads a website to a string on all three systems. The rest of the programm is not the problem. I just need this function.
Greetings from germany
Frank
Re: Download Website to String
MyCode:
Code: Select all
Procedure.s ReceiveHTTPString(URL$, TimeOut=5000)
Protected Event, Time, Size, String$, Inhalt
Protected BufferSize = $1000, *Buffer = AllocateMemory(BufferSize)
Protected ServerName$ = GetURLPart(URL$, #PB_URL_Site)
Protected ConnectionID = OpenNetworkConnection(ServerName$, 80)
If ConnectionID
SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#LFCR$+#LFCR$)
Time = ElapsedMilliseconds()
Repeat
Delay(10)
Event = NetworkClientEvent(ConnectionID)
If Event = #PB_NetworkEvent_Data
Repeat
Size = ReceiveNetworkData(ConnectionID, *Buffer, BufferSize)
String$ + PeekS(*Buffer, Size, #PB_Ascii)
Until Not Size
Inhalt = FindString(String$, #LFCR$, 1)
If Inhalt
String$ = Mid(String$, Inhalt+3)
EndIf
EndIf
Until ElapsedMilliseconds()-Time > TimeOut Or String$
CloseNetworkConnection(ConnectionID)
EndIf
FreeMemory(*Buffer)
ProcedureReturn String$
EndProcedure
InitNetwork()
Debug ReceiveHTTPString("http://data.unionbytes.de/ip.php")
Last edited by STARGÅTE on Sat Jan 08, 2011 6:08 pm, edited 1 time in total.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
- greyhoundcode
- Enthusiast
- Posts: 112
- Joined: Sun Dec 30, 2007 7:24 pm
Re: Download Website to String
I like that. X-platform and short, neat code.
Re: Download Website to String
Doesn't work with www.yahoo.com.
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
"PureBasic won't be object oriented, period" - Fred.
Re: Download Website to String
Since there is a problem with the transmission,
please use direct links
or http://us.yahoo.com/
please use direct links
Code: Select all
InitNetwork()
Debug ReceiveHTTPString("http://de.yahoo.com/")
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
- greyhoundcode
- Enthusiast
- Posts: 112
- Joined: Sun Dec 30, 2007 7:24 pm
Re: Download Website to String
Works for me - correctly returns a header with a 302 redirect, which reflects what happens when I visit that address in a browser - I end up at uk.yahoo.com (obviously, varies according to where in the world it thinks you are).PB wrote:Doesn't work with http://www.yahoo.com.
If you wanted your code to follow the redirect you'd obviously need to parse for the Location: 'xyz.com' instruction and follow appropriately.
-
- New User
- Posts: 3
- Joined: Thu Nov 11, 2010 6:41 pm
Re: Download Website to String
Hi ...
Now i tested it on all three systems on 11 computers and have one last problem ...
When I try to load websites like this: http://www.mysmartphoneinfo.com/verizon ... rld/399884
Then I get no data, but headers like this:
HTTP/1.1 200 OK
Date: Sat, 27 Nov 2010 14:28:38 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback: http://www.mysmartphoneinfo.com/xmlrpc.php
Link: <http://www.mysmartphoneinfo.com/?p=399884>; rel=shortlink
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8
So what can I do to make my program loading such websites?
Now i tested it on all three systems on 11 computers and have one last problem ...
When I try to load websites like this: http://www.mysmartphoneinfo.com/verizon ... rld/399884
Then I get no data, but headers like this:
HTTP/1.1 200 OK
Date: Sat, 27 Nov 2010 14:28:38 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback: http://www.mysmartphoneinfo.com/xmlrpc.php
Link: <http://www.mysmartphoneinfo.com/?p=399884>; rel=shortlink
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8
So what can I do to make my program loading such websites?
- greyhoundcode
- Enthusiast
- Posts: 112
- Joined: Sun Dec 30, 2007 7:24 pm
Re: Download Website to String
Strange one. These are the headers I get:DerProgrammierer78 wrote:Now i tested it on all three systems on 11 computers and have one last problem ... When I try to load websites like this: http://www.mysmartphoneinfo.com/verizon ... rld/399884 Then I get no data
HTTP/1.1 301 Moved Permanently
Date: Sat, 27 Nov 2010 18:22:52 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback: http://www.mysmartphoneinfo.com/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Sat, 27 Nov 2010 18:22:52 GMT
Location: http://http/www.mysmartphoneinfo.com/ve ... rld/399884
Vary: Accept-Encoding
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8
Seems odd A) that it is redirecting, apparently to the very same URL and B) there seems to be an error in the formatting of the URL within the Location header.
Anyone else shed any light?
- greyhoundcode
- Enthusiast
- Posts: 112
- Joined: Sun Dec 30, 2007 7:24 pm
Re: Download Website to String
Using the HTTP stream wrapper in PHP and the fopen() command it loads the website right off the bat. I still get the same redirect headers coming back at me using STARGÅTE's code in PB though. Don't understand how a redirect to http://http/www.mysmartphoneinfo.com/ve ... rld/399884 works - that surely isn't a valid address!
Re: Download Website to String
I did not test or even look through your code thoroughly,
but this part looks wrong to me:
SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#CRLF$+#CRLF$)
URL$ should be the absolute path and not the absoluteURI.
Also, if you are addressing a server using virtual hosting (1 IP address mapping to multiple hostnames)
- I consider this to be normal at this time -
you need to provide a "Host: " header field!
Depending on the server, other header fields might be needed.
This could even include cookies.
When I addressed the problem of getting the source code of a website into a string,
I came to using wget. (I called it using RunProgram()...)
Wget is available for Windows, as well as for Linux.
(Probably also for Mac)
Wget can also follow redirects automatically and you can tell it the path to a cookies.txt file to use.
but this part looks wrong to me:
It should probably be:STARGÅTE wrote:Code: Select all
... SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#LFCR$+#LFCR$) ...
SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#CRLF$+#CRLF$)
URL$ should be the absolute path and not the absoluteURI.
See rfc1945.rfc1945 wrote:The absoluteURI form is only allowed when the request is being made
to a proxy.
Also, if you are addressing a server using virtual hosting (1 IP address mapping to multiple hostnames)
- I consider this to be normal at this time -
you need to provide a "Host: " header field!
Depending on the server, other header fields might be needed.
This could even include cookies.
When I addressed the problem of getting the source code of a website into a string,
I came to using wget. (I called it using RunProgram()...)
Wget is available for Windows, as well as for Linux.
(Probably also for Mac)
Wget can also follow redirects automatically and you can tell it the path to a cookies.txt file to use.