Multithreaded http downloading ?

Just starting out? Need help? Post your questions and find answers here.
User avatar
pureballs
User
User
Posts: 41
Joined: Mon Oct 27, 2008 5:18 pm

Multithreaded http downloading ?

Post by pureballs »

I'd like to create some code / component that would download a bundle of http links simultaneously, with 8 or 16 threads at once... has anyone managed to get something like that working already ?

Something like Wget, but parallel instead of serial.

Thanks in advance... :wink:
Pureballs.
Captn. Jinguji
User
User
Posts: 94
Joined: Sun Oct 24, 2004 9:25 am

Re: Multithreaded http downloading ?

Post by Captn. Jinguji »

I managed to do that with youtube videos. I didn't actually try 8 or 16, but 3 or 4 in parallel didn't pose a problem until now.
Just had to use Urldownloadtofile_()
Is this an artifact or should it be disposed of ?
User avatar
pureballs
User
User
Posts: 41
Joined: Mon Oct 27, 2008 5:18 pm

Re: Multithreaded http downloading ?

Post by pureballs »

Captn. Jinguji wrote:Just had to use Urldownloadtofile_()
Urldownloadtofile_ uses Internet Explorer components, right ?

I prefer to use PB's buildin TCP for this, so I can change UserAgent fields etc... also because I'd like to use this on Linux...
Captn. Jinguji
User
User
Posts: 94
Joined: Sun Oct 24, 2004 9:25 am

Re: Multithreaded http downloading ?

Post by Captn. Jinguji »

pureballs wrote:
Captn. Jinguji wrote:Just had to use Urldownloadtofile_()
Urldownloadtofile_ uses Internet Explorer components, right ?

I prefer to use PB's buildin TCP for this, so I can change UserAgent fields etc... also because I'd like to use this on Linux...
Not sure about IExplorer usage, even on MSDN in the URLDownoadToFile() article, a user claims that he used it with FF flawlessly until recently.

But it's probably confined to Windows OS. I didn't see you request that it has to be LINUX in your OP.

I pointed out URLdownloadtoFile because PB's own HTTPdownloadtofile() wouldn't work with youtube vidfiles (at least for me).

I'm going to keep watching this thread because a "neutral" solution seems like favourable to me, too.
Is this an artifact or should it be disposed of ?
Hi-Toro
Enthusiast
Enthusiast
Posts: 270
Joined: Sat Apr 26, 2003 3:23 pm

Re: Multithreaded http downloading ?

Post by Hi-Toro »

For platform-independent downloading, you could take a look at this Blitz code I wrote:

http://www.blitzbasic.com/codearcs/code ... ?code=2566

You'd have to translate it to use PB's networking functions, but it does show the basic method of interacting with a HTTP server to download a file. You'd then just have to wrap it up so it can be called via CreateThread ().

It's basically a case of sending something like this once connected to the server:

Code: Select all

WriteLine www, "GET " + file + " HTTP/1.1"
WriteLine www, "Host: " + host
WriteLine www, "User-Agent: BlitzGet Deluxe"
WriteLine www, "Accept: */*"
WriteLine www, ""
Note that host is the host name, eg. "www.purebasic.com", and file is the path and file name on the host, eg. "/images/whatever.jpg" (note preceding "/").

Then you read the response, which will be something like "HTTP/1.0 200 OK" (parse the string so you just get the code, eg. "200" meaning the file is found, or "404" for not found), then read until the server sends a blank line, which will give something like this (different for every server/request):

Code: Select all

Content-Type: image/gif
Last-Modified: Thu, 25 Mar 2010 09:42:43 GMT
Date: Wed, 26 May 2010 22:53:11 GMT
Expires: Wed, 26 May 2010 22:53:11 GMT
Cache-Control: private, max-age=31536000
X-Content-Type-Options: nosniff
Server: sffe
Content-Length: 7963
X-Cache: MISS from .
Via: 1.0 .:8000 (squid)
Connection: close

You can parse each line for anything you're interested in, eg. "Content-Length: " (number of bytes in file), then decide what to do based on the response code you got at the start.

For example, with response code 200 you can then just read "Content-Length" bytes from the server to download the file. With code 404, the file doesn't exist, so you just print a message and abort.

Once you've written a function to do this, modify it to suit the requirements of CreateThread and you have a multithreaded downloader!

I strongly recommend writing it from scratch using the above code as a guide, as the way PB handles server interactions is quite different to Blitz, so it's not really possible to convert line-by-line. However, the information you send and receive, and how you parse it, is of course the same in any language.
James Boyd
http://www.hi-toro.com/
Death to the Pixies!
Post Reply