CAPTURE DATA

Just starting out? Need help? Post your questions and find answers here.
Rikuk
User
User
Posts: 24
Joined: Mon May 30, 2005 11:36 am

CAPTURE DATA

Post by Rikuk »

Is there a simple way of capturing a table from a web page.

The web table I'm looking at is Horse Racing cards an example is at the following page.

http://www.attheraces.com/card.asp?race ... =racecards

Rik :D
josku_x
Addict
Addict
Posts: 997
Joined: Sat Sep 24, 2005 2:08 pm

Post by josku_x »

Sure there is. Look in Purearea.net for code to get the html of a page, use it and get the html of the page you posted. Then in your PB program look for the table with the horse racing stuff... basically it's just doing a search for "<table", but, many pages use tables as a part of their website template. So, you have to look yourself the HTML of a page, look for the table you need and do a more flexibler search, like: "<table id='HorseRacing'". Then, just look for the next </table>, and copy the text into a buffer, after that you can save the buffer to a file, or do whatever you want...

I think this isn't harD?

EDIT: I looked the source of that horse race site, you should do the equivalent as I'm doing here:

Code: Select all

URL$="http://www.attheraces.com/card.asp?raceid=105462&meetingid=18436&date=2006-04-21&ref=FastFixtures&nav=racecards"
HTML$=GetPageHTML(URL$)
StartPos=FindString(HTML$, "<table id="+Chr(34)+"oddsComparison"+Chr(34), 0)
EndPos=FindString(HTML$, "</table>", StartPos)
HorseTable$=Mid(HTML$, StartPos, EndPos-StartPos)
; Now you have the HorseTable$ to play around with.
EDIT2: I have made this procedure which gets the HTML of a page.

Code: Select all

Procedure$ GetPageHTML(URL$)
Static hInternet, HTML$, Bytes
If LCase(Left(URL$, 7))="http://"
 hInternet=InternetOpenUrl_(InternetOpen_("Agent", 0, 0, 0, 0), @URL$, "", 0, 0, 0)
 HTML$=Space(10000) ; Create a 10Kb buffer. (Make it bigger for bigger sites)
 If hInternet
  InternetReadFile_(hInternet, @HTML$, Len(HTML$), @Bytes)
  InternetCloseHandle_(hInternet)
  HTML$=Mid(HTML$, 0, Bytes)
  ProcedureReturn HTML$
 EndIf
EndIf
EndProcedure
Rikuk
User
User
Posts: 24
Joined: Mon May 30, 2005 11:36 am

Post by Rikuk »

Thanks How do you write the data to a file?

Rik
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8452
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Post by netmaestro »

Here's a complete working program. You are going to want to do a significant amount of string parsing to the output to format it in a way that is pleasing to you but this gives you something to work with:

Code: Select all

ProcedureDLL.s GetPageHTML(URL.s)  
  #INTERNET_FLAG_RELOAD = $80000000 
  Bytes.l = 0
  Html.s  = Space(100000)
  IPAddr$ =""; 
  hInet.l = InternetOpen_(URL, 1, #Null, #Null, 0) 
  If hInet 
    hURL.l  = InternetOpenUrl_(hInet, URL, #Null, 0, #INTERNET_FLAG_RELOAD, 0) 
    If hURL 
      If InternetReadFile_(hURL, @Html, Len(Html), @Bytes) 
        Html = Trim(Html)
      Else 
        Html = "Failed"
      EndIf 
    Else 
      Html = "Failed" 
    EndIf    
  Else 
    Html = "Failed" 
  EndIf 
  InternetCloseHandle_(hInet)
  InternetCloseHandle_(hURL)  
  ProcedureReturn Html
EndProcedure

OpenConsole()
PrintN("Reading web page, please wait...")
URL$="http://www.attheraces.com/card.asp?raceid=105462&meetingid=18436&date=2006-04-21&ref=FastFixtures&nav=racecards" 
HTML$=GetPageHTML(URL$) 
PrintN("Read completed!")
StartPos=FindString(HTML$, "<table id="+Chr(34)+"racecardtable"+Chr(34), 0) 
EndPos=FindString(HTML$, "</table>", StartPos) 
HorseTable$=Mid(HTML$, StartPos, EndPos-StartPos) 

CreateFile(0,"horsetable.txt")
WriteStringN(0,horsetable$)
CloseFile(0)
PrintN("Output saved to horsetable.txt")
PrintN("Press <enter> to close")
Input()
CloseConsole()
BERESHEIT
spreadz
User
User
Posts: 10
Joined: Sat Feb 11, 2012 1:13 pm
Location: Chesterfield UK
Contact:

Re:

Post by spreadz »

Hi,
Very limited knowledge of Purebasic web interaction, so very many thanks for the HTML$=GetPageHTML(URL$) code below, it has been very useful to me.

Can you or anyone else please help with one further step?
Currently I'm having to click each link manually in my browser to get the page URL's I need, then paste each one into my code.
The reason is that page link I have to use is not a simple url but instead executes javascript

e.g. Following any "Fast Cards" link using my browser from "http://horses.sportinglife.com/Meetings/"
I can then see the destination url in the new window of something like "http://horses.sportinglife.com/SL_Fast_ ... 50,00.html"
By assigning this to URL$ I can then use HTML$=GetPageHTML(URL$) for my parsing.

However, if I try to automate by parsing the /Meetings/ page to get the javascript element and do ...
URL$="http://horses.sportinglife.com/Meetings ... ,'128050')"
... the page returned is nothing like the information I get when copy & pasting manually.

Maybe what I need is two-stage process?
Get JavaURL$ from the javascript action
URL$=JavaURL$
Then use HTML$=GetPageHTML(URL$)

. . . but how??

Spreadz

josku_x wrote:Sure there is. Look in Purearea.net for code to get the html of a page, use it and get the html of the page you posted. Then in your PB program look for the table with the horse racing stuff... basically it's just doing a search for "<table", but, many pages use tables as a part of their website template. So, you have to look yourself the HTML of a page, look for the table you need and do a more flexibler search, like: "<table id='HorseRacing'". Then, just look for the next </table>, and copy the text into a buffer, after that you can save the buffer to a file, or do whatever you want...

I think this isn't harD?

EDIT: I looked the source of that horse race site, you should do the equivalent as I'm doing here:

Code: Select all

URL$="http://www.attheraces.com/card.asp?raceid=105462&meetingid=18436&date=2006-04-21&ref=FastFixtures&nav=racecards"
HTML$=GetPageHTML(URL$)
StartPos=FindString(HTML$, "<table id="+Chr(34)+"oddsComparison"+Chr(34), 0)
EndPos=FindString(HTML$, "</table>", StartPos)
HorseTable$=Mid(HTML$, StartPos, EndPos-StartPos)
; Now you have the HorseTable$ to play around with.
EDIT2: I have made this procedure which gets the HTML of a page.

Code: Select all

Procedure$ GetPageHTML(URL$)
Static hInternet, HTML$, Bytes
If LCase(Left(URL$, 7))="http://"
 hInternet=InternetOpenUrl_(InternetOpen_("Agent", 0, 0, 0, 0), @URL$, "", 0, 0, 0)
 HTML$=Space(10000) ; Create a 10Kb buffer. (Make it bigger for bigger sites)
 If hInternet
  InternetReadFile_(hInternet, @HTML$, Len(HTML$), @Bytes)
  InternetCloseHandle_(hInternet)
  HTML$=Mid(HTML$, 0, Bytes)
  ProcedureReturn HTML$
 EndIf
EndIf
EndProcedure
Post Reply