Reconstructing/Extending HTTP library?

AND51 · Post by **AND51** » Sun Jan 20, 2008 1:06 am

Hallo!

Regarding to this thread http://www.purebasic.fr/english/viewtopic.php?t=30671 I want to make suggestions how to improve this library.
I also appreciate, if others can give us their comments on what they'd like to see.

Basically, I would suggest to introduce the system with the IDs, but also add one/some "quick commands" (don't know, how to call them).

First, I talk about the "quick commands". Second, I talk about the actual feature request.

-------------------------------------------------------------------------------------

A "quick command" could be ReceiveHTTPFile(). It's task is just to send a quick request that does not require any HTTP knowledge of the programmer.

Advantage: Quickly download a HTTP-file without any HTTP knowledge
Disadvantage: Low accuracy (see 'bugs' in linked topic above) and no preferences can be specified

Of course there can also be more "quick commands", e. g.

ReceiveHTTPHeader()
Already existing, should send a "HEAD"-Request to gain general information about a HTTP file
GetHTTPFileAttribute(URL$, Flag)
Should aquire some detailed information about a HTTP-File. Can be controled via flags. These flags could be for example #PB_HTTP_FileSize (get file size), #PB_HTTP_GetDate (get last modification date) or #PB_HTTP_PartialDownloadAvailable (should detect, wheter this file may be downloaded partially (useful to continue an aborted download, e. g. for download managers)).
All these information can be aquired by using "HEAD" instead of "GET". HEAD advises the server just to send the HTTP respond without the file itself. Useful if you want the information quickly.
This is also useful to know the file size before downloading and you can also get the information, if you can download a file partially. if yes, the server should send the "Accept-Ranges" attribute.

Summing up, these "quick commands" should do their job quickly and without any inteference of the programer. They're a very comfortable way to access the most important HTTP functions.

-------------------------------------------------------------------------------------

For the HTTP experts and for those, who want more control there should be the actual HTTP-library that bases on the ID system.
For each HTTP transaction you create your own HTTP job through special commands.
The "new"/reconstructed HTTP library should offer (at least) the following possibilities:

download a HTTP document/file
- to disk
- to memory
get specific and detailed information also before the transaction starts
- file size
- partial download possible
- last modification date
- ...
programer is able to interfer:
He can manipulate the HTTP request, e. g. the User-Agent attribute, HTTP version, etc.
Cookie handling
POST-data handling (for data and file upload)
Inform the user by passing through the HTTP status codes
Authentification handling (don't swap this with HTTPS!)
Callback handling (to make the programer able to calculate downloadspeed, remaining time, etc.)
Proxy handling and HTTPS support
(Attention: This is only an imaginary idea. I don't know how/if this can be realized. Furthemore, this last point has a low priority IMHO).

As I said, I would vote for an ID system like you assign IDs to files, images, sprites, etc.
The advantage is that the HTTP management is simple, but very powerful!
Due to different IDs, you can differ between many HTTP transactions. You can also download more files simultaniously (e. g. by using threads).
Remember that if you don't need this extended stuff, you can still use the "quick commands" as I described above.

I don't know, how PB handles this system with the IDs internally; I would do this with a (global/shared) LinkedList.
Nevertheless, let's have a look, how an extended HTTP transaction in PureBasic could look like:

Code: Select all

InitNetwork()


; A very simple download
AllocateHTTP(0, "http://www.and51.de/index.html")
   Debug DownloadHTTP(0, "C:\index.htm")
FreeHTTP(0)




; A more complex download
Define *greetins=AllocateMemory(100)
PokeS(*greetings, "Hello from PureBasic!")

AllocateHTTP(1, "http://www.and51.de/index.html?language=de")
   Debug ExamineHTTP(1)
   Define SIZE=GetHTTPAttribute(1, #PB_HTTP_FileSize)
   Debug FormatDate("%hhh$iim%sss %dd-%mm-%yyyy", GetHTTPAttribute(1, #PB_HTTP_ModificationDate))
   
   ; Okay, we got our information and we can "prepare" the download
   ; Let's say, we're dependent of HTTP/1.0, we only want the first 50 bytes and we want to send some POST data
   SetHTTPRequest(1, #PB_HTTP_UserAgent, "myPureBasic Program/1.0")
   SetHTTPRequest(1, #PB_HTTP_Range, "50")
   SetHTTPRequest(1, #PB_HTTP_CacheControl, "only-if-chaced")
   SetHTTPRequest(1, #PB_HTTP_Referer, "http://www.purebasic.com/")
   SetHTTPRequest(1, #PB_HTTP_HTTPVersion, "1.0")
   SetHTTPRequestPost(1, #PB_HTTP_PostData, *greetings)
   

   ; No more ideas so let's accomplish the job!
   Debug DownloadHTTP(1, "C:\temp.txt")
   ; Second download, but the complete file this time
   SetHTTPRequest(1, #PB_HTTP_Range, "")
   Define *buffer=AllocateMemory(SIZE)
   Debug ReceiveHTTP(1, *buffer)
FreeHTTP(1)

Simple download
Well, this could be easily done by a "quick command", but it should show you the elementary idea. Similiar to the file library, for example, you create a HTTP job and free it afterwards. During the meantime, you can do many things with it.

The more complex example
First, we need a buffer for the POST data, we want to send later on.
Second, we prepare the HTTP transaction. Therefore we use the AllocateHTTP() command that expects an ID (could also be #PB_Any) and the URI (Unique Resource Identifier) (in fact, URI = URL).
The URI can also contain the port (if port <> 80) and it can contain GET-Variables).
Internally, AllocateHTTP() could add an element to the global HTTP-List which is accessed by the other HTTP-functions. This would be my way of realizing it; as I said, I don't know how PB manages IDs internally. If the system of PB is better/different you can use PB's system.

In order to acquire information about a URI before the actual download begins, there are 2 possibilities:
Either each GetHTTPAttribute() creates its own connection to the server getting the specified information or each GetHTTPAttribute() call accesses the global LinkedList which is being filled with infotmation by a previous call of ExamineHTTP().
In fact, I prefer the second choice, because it saves time as there is only 1 connection necessary.
Of course, the returning value ExamineHTTP() should be checked, to detect, wether the information could be retrieved successfully. I abdicated this to simplify the exmaple.

Once, the information have been gained, you can read them. First, we save the URI size to a variable for later use.
Then you might want to know the modification date, so we debug it after passing FormatDate().
At this point, you could also detect, if the download is resumeable (#PB_HTTP_PartialDownloadAvailable). Remember this constant? I used it above, when I talked about the "quick commands".

Before the actual download, we specifiy the parameters. Our own User-Agent (yabadadoo!) and the Referer. Note: Some sites require the referer, so this is also a reson to allow the programer to interfere.
When we save the file to disk, we only want the first 50 byte of the URI.
Just because this is an example code, we also want cached-only files and because we're so funny, we set the HTTP-Version to 1.0.
Another important thing is the ability to append POST data. Remember that when you append POST data, you must add two HTTP-attributes:
1) Content-Length (tell the server how large your data bombardement is

)
2) Content-Encoding (tell the server what type of data you're sending (default: "multipart/formdata" or "Application/www-url-encoded")
Exactly this is the task of the HTTP library: It adds these two information on his own (whereas the programer still can change the MIME-type later on).

This is an example, so we download the URI regarding the given parameters. The Debug in front of DownloadHTTP() should remind us of the returning value. Remember, it would be useful to pass information to the programer.
In this case, the returning value could be 1 or 0 (self explaing what this stands for).

Attention: Please, differ bewteen DownloadHTTP() and ReceiveHTTP()! The first command saves the result to disk, the second pokes the URI to the memory.

We didn't free the HTTP job, so we're able to repeat it as often as we want. In this case we want to whole file into the memory.
All we need to do is to slightly change the parameters. Not only download the first 50 bytes, but the whole file, so we need to delete the Range-Attribute. We did save the URI size to a variable before, so we're able to allocate a dynamic memory block which is filled by ReceiveHTTP().
Remember that this is an example!
I don't know if it's best to feed the procedure with a user-created buffer or if the procedure should create it's own buffer. This is one of many points we have to discuss (we = PB Team & PB community).
So don't be shy and post your comment here!
Decide on your own:

the procedure creates the buffer
If the procedure creates the buffer, it must return the address. The concrete advantage is that this is more comfortable for the programer.
the user provides a buffer
If the user provides a buffer, the procedure is able to pass through the very important HTTP status codes! The programer is able to detect which code has been returned! He can easily find out if the download was successfull (code 200/OK), the partial download was successfull (code 206/Partial Content) or if the server processed the request but doesn't send data back (code 204/No Content). Of course other codes can be returned as well, for example a sever error (code 500/Internal Server Error), errors occured in the request (code 400/Bad Request) or the very famous code 404/File not Found.

-------------------------------------------------------------------------------------

Closing words:

This bloody thread took me >1 hour, so read it!!!!
Feel free to post your own comment!
Please, make suggestions, changes or ask if something is unclear!
I'm looking forward to read your feedback @PB-Team and @PB-Community.

Happy coding, AND51

Trond · Post by **Trond** » Sun Jan 20, 2008 12:31 pm

I would be happy if it didn't crash.

Post by **Fred** » Sun Jan 20, 2008 1:08 pm

This are all good ideas, we put that on the TODO list (this could takes time tough).

AND51 · Post by **AND51** » Sun Jan 20, 2008 1:54 pm

Trond wrote:I would be happy if it didn't crash.

Hm... I had no crashes yet...? But I must truely say that I hardly used the HTTP-library, because there's no documentation yet.

Fred wrote:This are all good ideas, we put that on the TODO list (this could takes time tough).

Hey Fred!
Thank you very, very much!

May I ask you what your opinion is? Do you find my example codes good or would you re-arrange things?
About the time: Nice to know that!

Kazmirzak · Post by **Kazmirzak** » Sun Jan 20, 2008 10:18 pm

I would opt for the easiest way:

DownloadHTTP(netadress,target$ [,referrer [,user-agent [, proxy [, whatever ]]]])

So that you can do download hard or easy with one command:
DownloadHTTP("www.purebasic.com/index.htm","C:\index.htm")
DownloadHTTP("www.purebasic.com/index.htm","C:\index. ... gle.de",-1, 5050)

AND51 · Post by **AND51** » Sun Jan 20, 2008 10:43 pm

This would be a good idea for the "quick commands" (as I call them. Fred calls them "helper commands").

Remember that PureBasic currently doesn't support multiline coding, so you would have all parameters in only 1 line!
Morover, you must have several commands, because your 'target$' could also be a memory address and you can't pass a pointer, when a string is expected.

There are a lot of things you can specifiy when creating a HTTP-Request, so I recommend to use several commands. Furthemore, it's much easier for you as a programer to get information before/while/after downloading if we split it into several commands. But this is (only) my opinion, maybe there are also others who want us to tell their opinion.

dracflamloc · Post by **dracflamloc** » Mon Jan 21, 2008 12:08 am

May I add a request to download the file to memory as well as a file?

AND51 · Post by **AND51** » Mon Jan 21, 2008 12:19 am

If course you may add a request.

I don't know how to interpret your request, you you want to download a file to a memory buffer and to a file at the same time?
If not, this is requested. I suggested to build a command that downlaods a file to HDD and another command that saves a file to a memory buffer.

Maybe, there could also be a command that streams a file into a memory buffer? Is this what you meant?

dracflamloc · Post by **dracflamloc** » Mon Jan 21, 2008 4:51 am

Actually I just missed your idea

But yea I just meant download to a memory buffer.

PMV · Post by **PMV** » Sun Mar 02, 2008 8:29 pm

hm ... my wishes for the http-lib:
like this include

http://www.purebasic.fr/german/viewtopi ... 101#189101
(this include is only for showing ... it's not ready, of course)

MFG PMV

Tranquil · Post by **Tranquil** » Mon Mar 03, 2008 7:12 am

And also here AND on the Email-Lib, it would be very nice if the connection can be established with OpenNetworkConnection() so that connections through a proxy server is possible.

Rescator · Post by **Rescator** » Thu Mar 06, 2008 1:45 am

This is code I'm using in a few of my projects, it's still in "evolution".
It should handle http and https (and some certificate "quirks") and ftp just fine, and uses whatever default proxy settings the system has. (whatever IE settings are basically)

Not the prettiest thing out there, but it works, seems stable.
It supports a callback (not really tested much, I just basically dl to a file in windows tmp dir and don't track progress or do other things)
The callback procedure must be named:
Procedure __priv_Download_Url_Callback__()
a prototype is used to simplify things and allow the dl procedure to reference the callback "directly", shows you the power of PureBasic prototypes really

Stuff missing that can improve this code would be resume of downloads (HTTP 1.1 byte range) perhaps and so on. And no direct to memory dl yet, bu the code has been prepared for that. (plan was to add two more arguments to the procedure etc.) I never liked direct to memory though as you could end up needing a lot of memory for large files, and you would need to set a max size limit to prevent buffer overflows.

Oh and consider this public domain, a lot of this stuff was inspired by existing code on the forum, and some self research as well.

Fred, Freak, the agent, https and certificate tweaks and default proxy could easily be added to the current PB lib right?
Only reason I "rolled my own" was that the PB lib seemed to lack most of these features

Code: Select all

Enumeration 0
 #DURL_ERROR_OK
 #DURL_ERROR_ABORTED
 #DURL_ERROR_PROTOCOL
 #DURL_ERROR_INTERNET
 #DURL_ERROR_OPEN
 #DURL_ERROR_QUERY
 #DURL_ERROR_STATUS
 #DURL_ERROR_ALLOCATEBUFFER
 #DURL_ERROR_DOWNLOAD
EndEnumeration
Procedure.s Download_Url_Error(error.l)
Protected text$
 Select error
  Case #DURL_ERROR_OK
   text$="Download complete!"
  Case #DURL_ERROR_ABORTED
   text$="Download aborted!"
  Case #DURL_ERROR_PROTOCOL
   text$="Protocol not supported!"
  Case #DURL_ERROR_INTERNET
   text$="Internet connection not available!"
  Case #DURL_ERROR_OPEN
   text$="Failed to open url!"
  Case #DURL_ERROR_QUERY
   text$="Query failed!"
  Case #DURL_ERROR_STATUS
   text$="Status/server error!"
  Case #DURL_ERROR_ALLOCATEBUFFER
   text$="Unable to allocate buffer!"
  Case #DURL_ERROR_DOWNLOAD
   text$="Download failed!"
  Default
   text$="Unknown error!"
 EndSelect
 ProcedureReturn text$
EndProcedure

;Missing PB WinAPI constants.
#INTERNET_FLAG_RELOAD                   = $80000000
#INTERNET_DEFAULT_HTTP_PORT             = 80
#INTERNET_DEFAULT_HTTPS_PORT            = 443
#INTERNET_DEFAULT_FTP_PORT              = 21
#HTTP_QUERY_FLAG_NUMBER                 = $20000000
#HTTP_QUERY_CONTENT_LENGTH              = 5
#HTTP_QUERY_STATUS_CODE                 = 19
#HTTP_STATUS_OK                         = 200
#INTERNET_OPEN_TYPE_PRECONFIG           = 0
#INTERNET_FLAG_SECURE                   = $00800000
#INTERNET_FLAG_IGNORE_CERT_DATE_INVALID = $2000
#INTERNET_FLAG_IGNORE_CERT_CN_INVALID   = $1000
#INTERNET_FLAG_NO_CACHE_WRITE           = $04000000
Prototype.l __priv_Download_Url_Callback__(filesize.q,sizesofar.q,bytes.l,*buffer,*userdata)
Procedure.l Download_Url(url$,agent$,buffersize.l,destfile$,*callback=#False,*userdata=#False)
 Protected port.l,domain$,file$,l.l,n.l,hinet.l,hurl.l,bufsize.l,filesize$,file.l
 Protected dwordsize.l,scode.l,lpdwindex.l,*databuffer,bytes.l,size.q,filesize.q,error.l
 Protected __priv_Download_Url_Callback__
 error=#DURL_ERROR_OK
 If LCase(Left(url$,7))="http://"
  port=#INTERNET_DEFAULT_HTTP_PORT
  l=8
 ElseIf LCase(Left(url$,8))="https://"
  port=#INTERNET_DEFAULT_HTTPS_PORT
  l=9
 ElseIf LCase(Left(url$,6))="ftp://"
  port=#INTERNET_DEFAULT_FTP_PORT
  l=7
 Else
  error=#DURL_ERROR_PROTOCOL
 EndIf
 If (error=#DURL_ERROR_OK)
  n=FindString(url$,"/",l)
  If n=0
   domain$=Mid(url$,l,Len(url$))
   url$=url$+"/"
   file$=""
  Else
   domain$=Mid(url$,l,n-l)
   file$=Mid(url$,n+1,Len(url$))
  EndIf
  n=Val(StringField(domain$,2,":"))
  If n>0 : port=n : EndIf
  domain$=StringField(domain$,1,":")
  hinet=InternetOpen_(agent$,#INTERNET_OPEN_TYPE_PRECONFIG,#Null,#Null,#Null)
  If hinet
   If port=#INTERNET_DEFAULT_HTTP_PORT
    hurl=InternetOpenUrl_(hinet,url$,#Null,0,#INTERNET_FLAG_RELOAD|#INTERNET_FLAG_NO_CACHE_WRITE,0)
   ElseIf #INTERNET_DEFAULT_HTTPS_PORT
    hurl=InternetOpenUrl_(hinet,url$,#Null,0,#INTERNET_FLAG_RELOAD|#INTERNET_FLAG_NO_CACHE_WRITE|#INTERNET_FLAG_SECURE|#INTERNET_FLAG_IGNORE_CERT_CN_INVALID|#INTERNET_FLAG_IGNORE_CERT_DATE_INVALID,0)
   Else
    hurl=InternetOpenUrl_(hinet,url$,#Null,0,#INTERNET_FLAG_RELOAD|#INTERNET_FLAG_NO_CACHE_WRITE,0)
   EndIf
   If hurl
    dwordsize=4
    If HttpQueryInfo_(hurl,#HTTP_QUERY_FLAG_NUMBER|#HTTP_QUERY_STATUS_CODE,@scode,@dwordsize,@lpdwindex)
     If scode=#HTTP_STATUS_OK
      filesize$=Space(1024)
      dwordsize=1024*SizeOf(Character)
      If HttpQueryInfo_(hurl,#HTTP_QUERY_CONTENT_LENGTH,@filesize$,@dwordsize,@lpdwindex)=0
       filesize=0
      Else
       filesize=ValQ(Trim(filesize$))
      EndIf
      If *callback
       __priv_Download_Url_Callback__.__priv_Download_Url_Callback__=*callback
      EndIf
      If destfile$<>""
       file=CreateFile(#PB_Any,destfile$)
      EndIf
      bufsize=buffersize
      *databuffer=AllocateMemory(bufsize)
      If *databuffer
       Repeat
        If InternetReadFile_(hurl,*databuffer,bufsize,@bytes)
         If bytes
          size+bytes
          If *callback
           If __priv_Download_Url_Callback__(filesize,size,bytes,*databuffer,*userdata)=#False
            bytes=0
            error=#DURL_ERROR_ABORTED
           EndIf
          EndIf
          If IsFile(file)
           WriteData(file,*databuffer,bytes)
          EndIf
         EndIf
        Else
         If GetLastError_()=#ERROR_INSUFFICIENT_BUFFER
          l=bufsize+1024
          If l>buffersize*2
           bytes=0
           error=#DURL_ERROR_ALLOCATEBUFFER
          Else
           n=ReAllocateMemory(*databuffer,l)
           If n
            *databuffer=n
            bufsize=l
           Else
            bytes=0
            error=#DURL_ERROR_ALLOCATEBUFFER
           EndIf
          EndIf
         Else
          bytes=0
          error=#DURL_ERROR_DOWNLOAD
         EndIf
        EndIf
       Until bytes=0
       FreeMemory(*databuffer)
       If IsFile(file) : CloseFile(file) : EndIf
       file=#Null
      Else
       error=#DURL_ERROR_ALLOCATEBUFFER
      EndIf
     Else
      error=#DURL_ERROR_STATUS
     EndIf
    Else
     error=#DURL_ERROR_QUERY
    EndIf
    InternetCloseHandle_(hurl)
   Else
    error=#DURL_ERROR_OPEN
   EndIf
   InternetCloseHandle_(hinet)
  Else
   error=#DURL_ERROR_INTERNET
  EndIf
 EndIf
 ProcedureReturn error
EndProcedure

Karbon · Post by **Karbon** » Thu Jun 12, 2008 5:30 pm

SetHTTPRequest(1, #PB_HTTP_UserAgent, "myPureBasic Program/1.0")

Is an absolute must. Not having a user agent sent at all makes requests made using the HTTP library look invalid to the authors of almost any web application (search engines especially).

nemo.lechat · Post by **nemo.lechat** » Fri Dec 05, 2008 7:22 am

Thank you Rescator for your function, you've saved my day !