Page 1 of 1

CAPTURE DATA

Posted: Thu Apr 20, 2006 7:07 pm
by Rikuk
Is there a simple way of capturing a table from a web page.

The web table I'm looking at is Horse Racing cards an example is at the following page.

http://www.attheraces.com/card.asp?race ... =racecards

Rik :D

Posted: Fri Apr 21, 2006 6:42 am
by josku_x
Sure there is. Look in Purearea.net for code to get the html of a page, use it and get the html of the page you posted. Then in your PB program look for the table with the horse racing stuff... basically it's just doing a search for "<table", but, many pages use tables as a part of their website template. So, you have to look yourself the HTML of a page, look for the table you need and do a more flexibler search, like: "<table id='HorseRacing'". Then, just look for the next </table>, and copy the text into a buffer, after that you can save the buffer to a file, or do whatever you want...

I think this isn't harD?

EDIT: I looked the source of that horse race site, you should do the equivalent as I'm doing here:

Code: Select all

URL$="http://www.attheraces.com/card.asp?raceid=105462&meetingid=18436&date=2006-04-21&ref=FastFixtures&nav=racecards"
HTML$=GetPageHTML(URL$)
StartPos=FindString(HTML$, "<table id="+Chr(34)+"oddsComparison"+Chr(34), 0)
EndPos=FindString(HTML$, "</table>", StartPos)
HorseTable$=Mid(HTML$, StartPos, EndPos-StartPos)
; Now you have the HorseTable$ to play around with.
EDIT2: I have made this procedure which gets the HTML of a page.

Code: Select all

Procedure$ GetPageHTML(URL$)
Static hInternet, HTML$, Bytes
If LCase(Left(URL$, 7))="http://"
 hInternet=InternetOpenUrl_(InternetOpen_("Agent", 0, 0, 0, 0), @URL$, "", 0, 0, 0)
 HTML$=Space(10000) ; Create a 10Kb buffer. (Make it bigger for bigger sites)
 If hInternet
  InternetReadFile_(hInternet, @HTML$, Len(HTML$), @Bytes)
  InternetCloseHandle_(hInternet)
  HTML$=Mid(HTML$, 0, Bytes)
  ProcedureReturn HTML$
 EndIf
EndIf
EndProcedure

Posted: Sun Apr 23, 2006 7:14 pm
by Rikuk
Thanks How do you write the data to a file?

Rik

Posted: Sun Apr 23, 2006 10:42 pm
by netmaestro
Here's a complete working program. You are going to want to do a significant amount of string parsing to the output to format it in a way that is pleasing to you but this gives you something to work with:

Code: Select all

ProcedureDLL.s GetPageHTML(URL.s)  
  #INTERNET_FLAG_RELOAD = $80000000 
  Bytes.l = 0
  Html.s  = Space(100000)
  IPAddr$ =""; 
  hInet.l = InternetOpen_(URL, 1, #Null, #Null, 0) 
  If hInet 
    hURL.l  = InternetOpenUrl_(hInet, URL, #Null, 0, #INTERNET_FLAG_RELOAD, 0) 
    If hURL 
      If InternetReadFile_(hURL, @Html, Len(Html), @Bytes) 
        Html = Trim(Html)
      Else 
        Html = "Failed"
      EndIf 
    Else 
      Html = "Failed" 
    EndIf    
  Else 
    Html = "Failed" 
  EndIf 
  InternetCloseHandle_(hInet)
  InternetCloseHandle_(hURL)  
  ProcedureReturn Html
EndProcedure

OpenConsole()
PrintN("Reading web page, please wait...")
URL$="http://www.attheraces.com/card.asp?raceid=105462&meetingid=18436&date=2006-04-21&ref=FastFixtures&nav=racecards" 
HTML$=GetPageHTML(URL$) 
PrintN("Read completed!")
StartPos=FindString(HTML$, "<table id="+Chr(34)+"racecardtable"+Chr(34), 0) 
EndPos=FindString(HTML$, "</table>", StartPos) 
HorseTable$=Mid(HTML$, StartPos, EndPos-StartPos) 

CreateFile(0,"horsetable.txt")
WriteStringN(0,horsetable$)
CloseFile(0)
PrintN("Output saved to horsetable.txt")
PrintN("Press <enter> to close")
Input()
CloseConsole()

Re:

Posted: Sat Feb 11, 2012 1:50 pm
by spreadz
Hi,
Very limited knowledge of Purebasic web interaction, so very many thanks for the HTML$=GetPageHTML(URL$) code below, it has been very useful to me.

Can you or anyone else please help with one further step?
Currently I'm having to click each link manually in my browser to get the page URL's I need, then paste each one into my code.
The reason is that page link I have to use is not a simple url but instead executes javascript

e.g. Following any "Fast Cards" link using my browser from "http://horses.sportinglife.com/Meetings/"
I can then see the destination url in the new window of something like "http://horses.sportinglife.com/SL_Fast_ ... 50,00.html"
By assigning this to URL$ I can then use HTML$=GetPageHTML(URL$) for my parsing.

However, if I try to automate by parsing the /Meetings/ page to get the javascript element and do ...
URL$="http://horses.sportinglife.com/Meetings ... ,'128050')"
... the page returned is nothing like the information I get when copy & pasting manually.

Maybe what I need is two-stage process?
Get JavaURL$ from the javascript action
URL$=JavaURL$
Then use HTML$=GetPageHTML(URL$)

. . . but how??

Spreadz

josku_x wrote:Sure there is. Look in Purearea.net for code to get the html of a page, use it and get the html of the page you posted. Then in your PB program look for the table with the horse racing stuff... basically it's just doing a search for "<table", but, many pages use tables as a part of their website template. So, you have to look yourself the HTML of a page, look for the table you need and do a more flexibler search, like: "<table id='HorseRacing'". Then, just look for the next </table>, and copy the text into a buffer, after that you can save the buffer to a file, or do whatever you want...

I think this isn't harD?

EDIT: I looked the source of that horse race site, you should do the equivalent as I'm doing here:

Code: Select all

URL$="http://www.attheraces.com/card.asp?raceid=105462&meetingid=18436&date=2006-04-21&ref=FastFixtures&nav=racecards"
HTML$=GetPageHTML(URL$)
StartPos=FindString(HTML$, "<table id="+Chr(34)+"oddsComparison"+Chr(34), 0)
EndPos=FindString(HTML$, "</table>", StartPos)
HorseTable$=Mid(HTML$, StartPos, EndPos-StartPos)
; Now you have the HorseTable$ to play around with.
EDIT2: I have made this procedure which gets the HTML of a page.

Code: Select all

Procedure$ GetPageHTML(URL$)
Static hInternet, HTML$, Bytes
If LCase(Left(URL$, 7))="http://"
 hInternet=InternetOpenUrl_(InternetOpen_("Agent", 0, 0, 0, 0), @URL$, "", 0, 0, 0)
 HTML$=Space(10000) ; Create a 10Kb buffer. (Make it bigger for bigger sites)
 If hInternet
  InternetReadFile_(hInternet, @HTML$, Len(HTML$), @Bytes)
  InternetCloseHandle_(hInternet)
  HTML$=Mid(HTML$, 0, Bytes)
  ProcedureReturn HTML$
 EndIf
EndIf
EndProcedure