How to read from a website?

Just starting out? Need help? Post your questions and find answers here.
sabater
User
User
Posts: 33
Joined: Wed Jul 13, 2005 3:40 am

How to read from a website?

Post by sabater »

Is possible read a string from a website? Example: search for the string @yahoo.com Then catch before the space until the end of the wanted string. In the case yahoo.com

Is possible this?
This is very goog for make a e-mail list.

Sorry my english... :cry:
fweil
Enthusiast
Enthusiast
Posts: 725
Joined: Thu Apr 22, 2004 5:56 pm
Location: France
Contact:

Post by fweil »

sabater,

I often parse web pages, to catch most of the time records from phonebooks or companies databases, using URLDownloadToFile_() API function.

Here some code adapted after El_Choni post because in some cases, dynamic pages won't load using URLDownloadToFile_() but only using a low level API code.

The Internet_Download_to_File(URL.s, FileName.s) procedure I now use is more convenient to process any kind of page.

Code: Select all

;
; From El_Choni
; http://forums.purebasic.com/english/viewtopic.php?t=15891
;
; Adapted F.Weil 20050719
;
Enumeration
  #File
EndEnumeration

#INTERNET_FLAG_RELOAD = $80000000
#INTERNET_DEFAULT_HTTP_PORT = 80
#INTERNET_SERVICE_HTTP = 3
#HTTP_QUERY_FLAG_NUMBER = $20000000
#HTTP_QUERY_CONTENT_LENGTH = 5
#HTTP_QUERY_STATUS_CODE = 19
#HTTP_STATUS_OK = 200
#INTERNET_OPEN_TYPE_DIRECT = 1

Procedure CheckError(value, sMessage.s, terminate)
  If value = 0
      Debug "Error : " + sMessage
      If terminate
          End
      EndIf
  EndIf
EndProcedure

Procedure Internet_Download_to_File(URL.s, FileName.s)
  If URLDownloadToFile_(#NULL, URL, FileName, #NULL, #NULL) <> 0
      Debug "Using low level API code"
      Domain.s = RemoveString(Left(URL, FindString(URL, "/", 8) - 1), "http://")
      dwordSize = 4
      hInet = InternetOpen_("Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.7.8) Gecko/20050511 Firefox/1.0.4", #INTERNET_OPEN_TYPE_DIRECT, #NULL, #NULL, 0)
      CheckError(hInet, "Internet connection not available.", #TRUE)
      hURL = InternetOpenUrl_(hInet, URL, #NULL, 0, #INTERNET_FLAG_RELOAD, 0)
      CheckError(hURL, "InternetOpenUrl_() failed", #TRUE)
      hInetCon = InternetConnect_(hInet, Domain, #INTERNET_DEFAULT_HTTP_PORT, #NULL, #NULL, #INTERNET_SERVICE_HTTP, 0, 0)
      CheckError(hInetCon, "Unable to connect to " + Domain, #TRUE)
      hHttpOpenRequest = HttpOpenRequest_(hInetCon, "HEAD", RemoveString(URL, "http://" + Domain + "/"), "http/1.0", #NULL, 0, #INTERNET_FLAG_RELOAD, 0)
      CheckError(hHttpOpenRequest, "Http open request to " + Domain + " failed", #TRUE)
      CheckError(HttpSendRequest_(hHttpOpenRequest, #NULL, 0, 0, 0), "Http send request to " + Domain + " failed.", #TRUE)
      CheckError(HttpQueryInfo_(hHttpOpenRequest, #HTTP_QUERY_FLAG_NUMBER | #HTTP_QUERY_STATUS_CODE, @sCode, @dwordSize, @lpdwIndex), "Http query failed.", #FALSE)
      CheckError(sCode = #HTTP_STATUS_OK, "Status code query failed.", #FALSE)
      CheckError(HttpQueryInfo_(hHttpOpenRequest, #HTTP_QUERY_FLAG_NUMBER | #HTTP_QUERY_CONTENT_LENGTH, @sCode, @dwordSize, @lpdwIndex), "CONTENT_LENGTH query failed.", #FALSE)
      If sCode
          DataBufferLength = sCode
        Else
          DataBufferLength = 4096
      EndIf
      *DataBuffer = AllocateMemory(DataBufferLength)
      CheckError(*DataBuffer, "Not enough memory.", #TRUE)
      CheckError(CreateFile(0, FileName), "Unable to create file.", #TRUE)
      Repeat
        CheckError(InternetReadFile_(hURL, *DataBuffer, DataBufferLength, @Bytes), "Download failed.", #TRUE)
        If Bytes
            WriteData(*DataBuffer, Bytes)
        EndIf
      Until Bytes=0
      CloseFile(0)
      FreeMemory(*DataBuffer)
      InternetCloseHandle_(hInetCon)
      InternetCloseHandle_(hURL)
      InternetCloseHandle_(hInet)
    Else
      Debug "Using URLDownloadToFile_() API code"
  EndIf
EndProcedure

;
; # URLs, some working with URLDownloadToFile, some not.
;
;  URL.s = "http://xoap.weather.com/weather/local/USNY0181?cc=*&dayf=1"
;  URL.s = "http://forums.purebasic.com/english/viewtopic.php?t=15891"
;  URL.s = "http://www.paroles.net/"
;  URL.s = "http://www.voila.fr/PagesJaunes/"
  URL.s = "http://www.societe.com/cgi-bin/liste?nom=cl+marketing&dirig=&pre=&ape=&dep=&image2.x=0&image2.y=0"
  FileName.s = "CacheFile.txt"
  Internet_Download_to_File(URL, FileName)
  If ReadFile(#File, FileName)
      Repeat
        a$ = ReadString()
        Debug a$
      Until Eof(#File)
      CloseFile(#File)
  EndIf
  DeleteFile(FileName)
End

; http://www.pagesjaunes.fr/pj.cgi?lang=fr&FRM_NOM=CL%20Marketing&FRM_DEPARTEMENT=64&TYPE_RECHERCHE=zzz
; http://www.pagesjaunes.fr/pb.cgi?lang=fr&FRM_NOM=CL%20Marketing&FRM_DEPARTEMENT=64&TYPE_RECHERCHE=zzz
Thank you to tell if this answers.

Rgrds
My avatar is a small copy of the 4x1.8m image I created and exposed at 'Le salon international du meuble à Paris' january 2004 in Matt Sindall's 'Shades' designers exhibition. The original laminated print was designed using a 150 dpi printout.
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

Might want to update the code a bit and add this

Code: Select all

If FindString(URL, "/", 8)=0
 Url.s=Url+"/"
EndIf
As the first lines in the procedure.
As it seems to fail otherwise if a rootpath url do not end wth a /

for example: http://www.google.com

might also wish to redo the DataBufferLength = sCode
line, as you could end up with out of memory errors,
use a actual buffer size instead of the entire filesize reported back. (if any)
Post Reply