How to get webpage source and search for specific data

Just starting out? Need help? Post your questions and find answers here.
User avatar
viiartz
User
User
Posts: 70
Joined: Tue Mar 28, 2006 2:00 am

How to get webpage source and search for specific data

Post by viiartz »

Wow the new forum looks great!

Question for you gurus. I would like to get exchange rates and uses them on another application...anyway.

With this url http://www.oanda.com/convert/fxdaily?da ... directed=1

I can get a web page from http://www.oanda.com which has the following information embedded
Currency,Code,AUD/1 Unit,Units/1 AUD

Euro,EUR,1.9758,0.5064
British Pound,GBP,2.1831,0.4584
Nigerian Naira,NGN,0.01,101.072
US Dollar,USD,1.4879,0.6724
How could I get (download) this web page source using the URL and then search the source to find ONLY the text above...in doing so, i would then parse the info into workable data then used in a future app.

Once I have the source i can for example search for the strings '"Euro,EUR," and the get the exchange rate info, I guess.

Any suggestion/ideas/hints how to go about it? (mainly the html source downloading from the url)

In advance...
Thanks,
ViiArtz
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: How to get webpage source and search for specific data

Post by luis »

If you search in the forum you'll find a lot of example of HTTP download, cross platform too.

If your program is windows only maybe you can use my routine based on the WinInet API
http://www.purebasic.fr/english/viewtop ... 99#p217199

Was written for PB 4.02, so maybe would be better to change most if not all of the .l to .i

Once you have the html inside a string, is all down to parse the string.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
viiartz
User
User
Posts: 70
Joined: Tue Mar 28, 2006 2:00 am

Re: How to get webpage source and search for specific data

Post by viiartz »

I saw your reply and was taking a look at the code and thought "Is this PB code, looks like comments" then I realised there was a scroll bar on the right, oops! not used to the new code window on the new look forum. Anyway, thanks for replying...I'll take a closer look at code.

I actually did do a search but I was not sure what I was searching for... anyway, I appreciate the help.
Thanks,
ViiArtz
UserOfPure
Enthusiast
Enthusiast
Posts: 469
Joined: Sun Mar 16, 2008 9:18 am

Re: How to get webpage source and search for specific data

Post by UserOfPure »

User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: How to get webpage source and search for specific data

Post by luis »

You are right, that's certainly the simpler solution and it's cross-platform !

If you don't mind the need to save the page to a file and don't need any of the additional features.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
viiartz
User
User
Posts: 70
Joined: Tue Mar 28, 2006 2:00 am

Re: How to get webpage source and search for specific data

Post by viiartz »

one word, brilliant! now how to save it to memory instead of a file?? :D
Thanks,
ViiArtz
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Re: How to get webpage source and search for specific data

Post by naw »

Once your have your HTML file - you should be able to strip out all the HTML with a Regular Expression.

RESULT1$ simplyhas all the HTML stripped out
RESULT2$ (like RESULT1$) has the HTML stripped out, but also ensures 1 space between each word...

Code: Select all

CreateRegularExpression(0, "\<[^\<]+\>")
STRING$="hELLO<P ALIGN=left HEIGHT=22>Hello World</p>     wORLD <Br>"   ;  Yes I know this is bullsh*t HTML
RESULT1$=ReplaceRegularExpression(0, STRING$,"")
RESULT2$=trim(ReplaceString(ReplaceString(ReplaceRegularExpression(0, STRING$," "),"  "," "),"  "," "))
Debug RESULT1$
Debug RESULT2$
End
Ta - N
User avatar
viiartz
User
User
Posts: 70
Joined: Tue Mar 28, 2006 2:00 am

Re: How to get webpage source and search for specific data

Post by viiartz »

Ok i should be able to save the HTML file but how (and i've searched but found nothing to guide me) do you load a text file into memory...how do I work/search/replace the data on the fly without using files? I this possible?
Thanks,
ViiArtz
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: How to get webpage source and search for specific data

Post by luis »

viiartz wrote:Ok i should be able to save the HTML file but how (and i've searched but found nothing to guide me) do you load a text file into memory...how do I work/search/replace the data on the fly without using files? I this possible?
If you used the PB function mentioned above you have to process the file.

Look in the help for ReadFile() and ReadString().

You loop until the read operation return end of file, appending each single line you read to a string.

Then you process the string one char at the time looping through the string contents or using FindString()

In any case, look in the help the section about FILES and STRINGS operation.

This is the easiest way.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
viiartz
User
User
Posts: 70
Joined: Tue Mar 28, 2006 2:00 am

Re: How to get webpage source and search for specific data

Post by viiartz »

Thanks luis, but i know how to load a file and process each line etc. What I meat was how can I get the HTML file into memory and do all the processing in memory (ie strip/search/parse etc) :)

Code: Select all

pseudo code

Get HTML File into LargeStringVariable$

;Loop 1 - strip HTML  
Do until End Of DATA (in LargeStringVariable$)
   Load data from LargeStringVariable$ one line at a time into TempString
         strip  HTML
   NewLargeStringVariable$ = NewLargeStringVariable + TempString
loop  

;Loop2
do
using NewLargeStringVariable$ search for text wanted and parse data
loop 
  
;finaly 
maybe save file or display on screen
end.
I've always wanted to write a program to do all the processing in memory but never had the need until now. Yes using files would be easy but then again in memory would be so much faster :D

I just want to know if it is possible for this case and how to do it.
Thanks,
ViiArtz
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: How to get webpage source and search for specific data

Post by luis »

Rick, if you used this

http://www.purebasic.com/documentation/ ... pfile.html

you can't. The help file says the function above save the downloaded data to a file.

That's exactly why I said:
luis wrote:You are right, that's certainly the simpler solution and it's cross-platform !

If you don't mind the need to save the page to a file and don't need any of the additional features.
If you want download the file from internet directly in memory, you have to code what you need by yourself.

That's why at the time I did the function I linked in my original post. If you look at the documentation or try the samples I wrote you'll see it can do what you ask and a lot more.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
viiartz
User
User
Posts: 70
Joined: Tue Mar 28, 2006 2:00 am

Re: How to get webpage source and search for specific data

Post by viiartz »

Luis sure, I see. I just got sidetracked by the ReceiveHTTPFile function and actually forgot about your initial post. :oops: I did realise that this function can only be used with files.

Anyway, I will take a closer look at your code...I do appreciate your help thus far.
Thanks,
ViiArtz
User avatar
viiartz
User
User
Posts: 70
Joined: Tue Mar 28, 2006 2:00 am

Re: How to get webpage source and search for specific data

Post by viiartz »

I thought I might share what I have been testing (for those like me who want to learn)... by no means efficient coding but getting concept out at least. Yes using files...the easy way for now!! Based on the ideas given on this thread so far. Thanks gurus :wink:

Code: Select all

  InitNetwork()
  Url$="http://www.oanda.com/convert/fxdaily?date=09/14/09&date_fmt=us&exch=AUD&lang=en&sel_list=EUR_GBP_NGN_USD_ZWD&value=1&format=CSV&redirected=1"
  Filename$ = "c:\users\rick\URLDown.txt" ;<--- you need to chage the path to username you longin with
  CreateRegularExpression(0, "\<[^\<]+\>")
  TempString$=""
  Temp$=""
  ; *** This is what i'm looking for to extract from the webpage.
  ; Base Currency: Australian Dollar, AUD on Monday, September 14, 2009  
  ; Currency,Code,AUD/1 Unit,Units/1 AUD
  ; 
  ; Euro,EUR,1.6888,0.5932
  ; British Pound,GBP,1.931,0.5188
  ; Nigerian Naira,NGN,0.007545,133.143
  ; US Dollar,USD,1.1587,0.864
  ; *******  ignore last line, only used as end of line marker
  ; Zimbabwe Dollar,ZWD,0.003264,306.756
  
  If ReceiveHTTPFile(Url$, Filename$)
      ;CallDebugger
    If ReadFile(0, Filename$)   ; if the file could be read, we continue...
      While Eof(0) = 0           ; loop as long the 'end of file' isn't reached
        TempString$=ReplaceRegularExpression(0, ReadString(0),"")
      If  FindString(TempString$, "Base Currency:", 1)>1 ;<-- first thing to finf]d
            Temp$=Trim(TempString$)+Chr(13)
            Repeat
            TempString$=ReplaceRegularExpression(0, ReadString(0),"")
            If TempString$ <> " "  
            If FindString(TempString$, "ZWD", 1)>1 ;<-- end of the text block marker 
            MessageRequester("What I found",Temp$,#MB_OK|$80) ;<--- &80 added to flags parameter to stop beeping. 
                Break
              EndIf
              Temp$ = Temp$+Trim(TempString$)+Chr(13) ;<-- adding Linefeed For MessageRequester function
            EndIf  
          ForEver
          
            End
            
          EndIf 
      Wend 
      CloseFile(0)
    EndIf  
    
  Else
    Debug "Failed"
  EndIf
  
Thanks,
ViiArtz
Post Reply