Page 1 of 1
How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 12:29 pm
by viiartz
Wow the new forum looks great!
Question for you gurus. I would like to get exchange rates and uses them on another application...anyway.
With this url
http://www.oanda.com/convert/fxdaily?da ... directed=1
I can get a web page from
http://www.oanda.com which has the following information embedded
Currency,Code,AUD/1 Unit,Units/1 AUD
Euro,EUR,1.9758,0.5064
British Pound,GBP,2.1831,0.4584
Nigerian Naira,NGN,0.01,101.072
US Dollar,USD,1.4879,0.6724
How could I get (download) this web page source using the URL and then search the source to find ONLY the text above...in doing so, i would then parse the info into workable data then used in a future app.
Once I have the source i can for example search for the strings '"Euro,EUR," and the get the exchange rate info, I guess.
Any suggestion/ideas/hints how to go about it? (mainly the html source downloading from the url)
In advance...
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 12:43 pm
by luis
If you search in the forum you'll find a lot of example of HTTP download, cross platform too.
If your program is windows only maybe you can use my routine based on the WinInet API
http://www.purebasic.fr/english/viewtop ... 99#p217199
Was written for PB 4.02, so maybe would be better to change most if not all of the .l to .i
Once you have the html inside a string, is all down to parse the string.
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 1:00 pm
by viiartz
I saw your reply and was taking a look at the code and thought "Is this PB code, looks like comments" then I realised there was a scroll bar on the right, oops! not used to the new code window on the new look forum. Anyway, thanks for replying...I'll take a closer look at code.
I actually did do a search but I was not sure what I was searching for... anyway, I appreciate the help.
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 1:48 pm
by luis
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 1:52 pm
by UserOfPure
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 2:06 pm
by luis
You are right, that's certainly the simpler solution and it's cross-platform !
If you don't mind the need to save the page to a file and don't need any of the additional features.
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 2:15 pm
by viiartz
one word, brilliant! now how to save it to memory instead of a file??

Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 4:30 pm
by naw
Once your have your HTML file - you should be able to strip out all the HTML with a Regular Expression.
RESULT1$ simplyhas all the HTML stripped out
RESULT2$ (like RESULT1$) has the HTML stripped out, but also ensures 1 space between each word...
Code: Select all
CreateRegularExpression(0, "\<[^\<]+\>")
STRING$="hELLO<P ALIGN=left HEIGHT=22>Hello World</p> wORLD <Br>" ; Yes I know this is bullsh*t HTML
RESULT1$=ReplaceRegularExpression(0, STRING$,"")
RESULT2$=trim(ReplaceString(ReplaceString(ReplaceRegularExpression(0, STRING$," ")," "," ")," "," "))
Debug RESULT1$
Debug RESULT2$
End
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 10:44 pm
by viiartz
Ok i should be able to save the HTML file but how (and i've searched but found nothing to guide me) do you load a text file into memory...how do I work/search/replace the data on the fly without using files? I this possible?
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 11:10 pm
by luis
viiartz wrote:Ok i should be able to save the HTML file but how (and i've searched but found nothing to guide me) do you load a text file into memory...how do I work/search/replace the data on the fly without using files? I this possible?
If you used the PB function mentioned above you have to process the file.
Look in the help for ReadFile() and ReadString().
You loop until the read operation return end of file, appending each single line you read to a string.
Then you process the string one char at the time looping through the string contents or using FindString()
In any case, look in the help the section about FILES and STRINGS operation.
This is the easiest way.
Re: How to get webpage source and search for specific data
Posted: Mon Sep 14, 2009 11:40 pm
by viiartz
Thanks luis, but i know how to load a file and process each line etc. What I meat was how can I get the HTML file into memory and do all the processing in memory (ie strip/search/parse etc)
Code: Select all
pseudo code
Get HTML File into LargeStringVariable$
;Loop 1 - strip HTML
Do until End Of DATA (in LargeStringVariable$)
Load data from LargeStringVariable$ one line at a time into TempString
strip HTML
NewLargeStringVariable$ = NewLargeStringVariable + TempString
loop
;Loop2
do
using NewLargeStringVariable$ search for text wanted and parse data
loop
;finaly
maybe save file or display on screen
end.
I've always wanted to write a program to do all the processing in memory but never had the need until now. Yes using files would be easy but then again in memory would be so much faster
I just want to know if it is possible for this case and how to do it.
Re: How to get webpage source and search for specific data
Posted: Tue Sep 15, 2009 12:01 am
by luis
Rick, if you used this
http://www.purebasic.com/documentation/ ... pfile.html
you can't. The help file says the function above save the downloaded data to a file.
That's exactly why I said:
luis wrote:You are right, that's certainly the simpler solution and it's cross-platform !
If you don't mind the need to save the page to a file and don't need any of the additional features.
If you want download the file from internet directly in memory, you have to code what you need by yourself.
That's why at the time I did the function I linked in my original post. If you look at the documentation or try the samples I wrote you'll see it can do what you ask and a lot more.
Re: How to get webpage source and search for specific data
Posted: Tue Sep 15, 2009 12:21 am
by viiartz
Luis sure, I see. I just got sidetracked by the ReceiveHTTPFile function and actually forgot about your initial post.

I did realise that this function can only be used with files.
Anyway, I will take a closer look at your code...I do appreciate your help thus far.
Re: How to get webpage source and search for specific data
Posted: Tue Sep 15, 2009 1:47 am
by viiartz
I thought I might share what I have been testing (for those like me who want to learn)... by no means efficient coding but getting concept out at least. Yes using files...the easy way for now!! Based on the ideas given on this thread so far. Thanks gurus
Code: Select all
InitNetwork()
Url$="http://www.oanda.com/convert/fxdaily?date=09/14/09&date_fmt=us&exch=AUD&lang=en&sel_list=EUR_GBP_NGN_USD_ZWD&value=1&format=CSV&redirected=1"
Filename$ = "c:\users\rick\URLDown.txt" ;<--- you need to chage the path to username you longin with
CreateRegularExpression(0, "\<[^\<]+\>")
TempString$=""
Temp$=""
; *** This is what i'm looking for to extract from the webpage.
; Base Currency: Australian Dollar, AUD on Monday, September 14, 2009
; Currency,Code,AUD/1 Unit,Units/1 AUD
;
; Euro,EUR,1.6888,0.5932
; British Pound,GBP,1.931,0.5188
; Nigerian Naira,NGN,0.007545,133.143
; US Dollar,USD,1.1587,0.864
; ******* ignore last line, only used as end of line marker
; Zimbabwe Dollar,ZWD,0.003264,306.756
If ReceiveHTTPFile(Url$, Filename$)
;CallDebugger
If ReadFile(0, Filename$) ; if the file could be read, we continue...
While Eof(0) = 0 ; loop as long the 'end of file' isn't reached
TempString$=ReplaceRegularExpression(0, ReadString(0),"")
If FindString(TempString$, "Base Currency:", 1)>1 ;<-- first thing to finf]d
Temp$=Trim(TempString$)+Chr(13)
Repeat
TempString$=ReplaceRegularExpression(0, ReadString(0),"")
If TempString$ <> " "
If FindString(TempString$, "ZWD", 1)>1 ;<-- end of the text block marker
MessageRequester("What I found",Temp$,#MB_OK|$80) ;<--- &80 added to flags parameter to stop beeping.
Break
EndIf
Temp$ = Temp$+Trim(TempString$)+Chr(13) ;<-- adding Linefeed For MessageRequester function
EndIf
ForEver
End
EndIf
Wend
CloseFile(0)
EndIf
Else
Debug "Failed"
EndIf