Page 1 of 4

Redirected URLs

Posted: Sat May 24, 2008 1:56 pm
by Little John
Works also with PB 5.20

Hello all!

Note: The code requires at least PB 4.20 final!
I wrote this in order to be able to download files from redirected URLs.

// edit 2008-05-30:
Based on an idea by Trond, I wrote another version which I think is better.
// edit 2010-03-10: Corrected the link, which had changed due to the new phpBB version. :(

Code: Select all

; successfully tested on PureBasic 4.20 final

EnableExplicit

Procedure.s RealURL (url.s, recursionDepth.l=10)
   ; Very useful, because ReceiveHTTPFile() does
   ; not handle redirected URLs.
   ; in : URL
   ; out: - if not redirected: original address
   ;      - if     redirected: target address
   Protected header.s, line.s, index.l

   header = GetHTTPHeader(url)
   index = 0
   Repeat
      index + 1
      line = StringField(header, index, #LF$)
;       Debug line
      If FindString(line, "Location:", 1) = 1
         If recursionDepth > 0           ; protection against endless loops
            line = LTrim(Mid(line, 10))
            If Right(line, 1) = #CR$
               line = Left(line, Len(line)-1)
            EndIf
            url = RealURL(line, recursionDepth-1)
         Else
            url = ""                     ; error
         EndIf
         Break
      EndIf
   Until line = ""

   ProcedureReturn url
EndProcedure


;-- Demo
InitNetwork()
Debug RealURL("http://forum.purebasic.com/")
Regards, Little John

Posted: Sun May 25, 2008 9:58 pm
by ar-s
Great stuff, thanks for sharing :P

Posted: Mon May 26, 2008 1:16 am
by AND51
Har Har, my code has only 13 lines and it is better, because it has the depth-parameter alreaddy included!
Infinite depth is also possible! No loops and Breaks, just 3 Ifs!

Check this out:

Code: Select all

InitNetwork()

Procedure.s traceURL(URL$, RecursionDepth=#INFINITE)
	Static location.l
	If Not location
		location=CreateRegularExpression(#PB_Any, "(?im)^Location: .*$")
	EndIf
	If RecursionDepth
		Protected Dim subPattern$(0)
		If ExtractRegularExpression(location, GetHTTPHeader(URL$), subPattern$())
			ProcedureReturn traceURL(Mid(subPattern$(0), 11), RecursionDepth-1)
		EndIf
	EndIf
	ProcedureReturn URL$
EndProcedure

Debug traceURL("http://tinyurl.com/5ev9wt") ; English PB Forum

Posted: Mon May 26, 2008 5:18 am
by Little John
AND51 wrote:Infinite depth is also possible!
Infinite depth is unwanted. That's the whole point why the procedure does check the depth.
Passing the depth as parameter is a good idea, though. I changed my code accordingly.

Regards, Little John

Posted: Mon May 26, 2008 8:00 am
by ABBKlaus
@AND51 : i don´t see any FreeRegularExpression :shock:

Posted: Mon May 26, 2008 10:03 am
by AND51
@ ABBKlaus:
FreeRegularExpression() is not neccessary. Why?
It is only declared once, because I use a static variable.
There is no memory leak or something like that.
Furthemore, the expression is freed automatically at program end. :wink:

Can't see any disadvantages of my code.


@ Little John:
Why is infinite depth unwanted? Normally, a website has 1 or 2 redirections. I consider the depth as an additional, but optional feature. Moreover, your code does not offer infinite depth, even if the user wants it, because your check is "if recusion > 0", my check is "if recursion".

The user who adopts our code can easily override the setting and adjust the code, so there's no need to discuss about this. The discussion should be about the performance and that's why I wonder why you're uising FindString(), Right(), Left() and so on...
IIRC, a HTTP-header does not nee to be seperated by #CRLF$'s, it can depend on the server software. If there are only #LF$'s for example, your code fails.

Posted: Mon May 26, 2008 10:46 am
by Kiffi
AND51 wrote:Furthemore, the expression is freed automatically at program end.
yes, @Fred: please remove all Free* - Commands!
AND51 showed us, that they are unnecessary :roll:
PB-Help wrote:If a regular expression isn't used anymore, use FreeRegularExpression().
Greetings ... Kiffi

Posted: Mon May 26, 2008 11:30 am
by Fred
Well, here AND51 is right, just take a look to FreeRegularExpression doc:
Note: all remaining regular expressions are automatically freed when the program ends.
It's the same for all PB objects, as PB does objects tracking.

Posted: Mon May 26, 2008 11:44 am
by AND51
> AND51 showed us, that they are unnecessary

Hey come on, guy!
The command is unnecessary in this case! You know very well, what I mean. In this case it's ok to leave out the Free-Command, because the expression is only allocated once. [...]
This is not an act of lazyness, but an act of performance optimizing, because the allocation is only proceeded once.

[...]

// Edit:
Oh, thank your Fred! BTW, I've shortend this post.

Posted: Mon May 26, 2008 11:48 am
by Kiffi
sorry for
<german_mode>
AND51 wrote:Simma jetzt durch mit dieser Diskussion? :?
ist halt nicht mein persönliches Verständnis von sauberer Programmierung. Aber mach Du mal...
</german_mode>

Posted: Mon May 26, 2008 11:59 am
by AND51
sorry for
<german_mode>
  • Kiffi wrote:ist halt nicht mein persönliches Verständnis von sauberer Programmierung.
    Du willst mich wohl veralbern?
    In the german forum Kiffi wrote:wer einfache Chart-Funktionalitäten in sein Programm integrieren möchte, kann sich ja mal folgendes zu Gemüte führen:

    Code: Select all

    If OpenWindow(0, #PB_Ignore, #PB_Ignore, 350, 200, "Google-Chart") And CreateGadgetList(WindowID(0)) 
      WebGadget(0, 10, 10, 330, 180, "http://chart.apis.google.com/chart?cht=p3&chd=s:hW&chs=250x100&chl=Hello|world") 
      Repeat 
      Until WaitWindowEvent() = #PB_Event_CloseWindow 
    EndIf  
    Wenn du doch so sauber arbeitest, wo ist denn hier bitte schön CloseWindow()? Und das ist nur ein Beispiel... :wink:
P.S.: Als Zeichen guten Willens habe ich mein Posting gekürzt, bzw. die Meinung von Fred (der ich auch bin) als die "einzig wahre" im Raum stehen lassen. Wir sollten uns wieder zusammenreißen, sind doch schließlich PB-Kollegen. Und wenn du ehrlich bist, "nötige" Free-Commandaufrufe lasse ich nie weg, z. B. mein Arbeiten mit Dateien. :wink:
</german_mode>

Posted: Mon May 26, 2008 4:39 pm
by Little John
AND51 wrote:Why is infinite depth unwanted?
Because resources are not infinite.
Imagine an URL "A" which is redirected to URL "B" which is again redirected to URL "A". Sure, this should not happen, but who knows. I don't want my programms to rely on unproved assumtions. That's the reason why I let the procedure check the recursion depth. If I wouldn't care about this issue, I hadn't introduced the depth parameter at all.
AND51 wrote:Moreover, your code does not offer infinite depth
You have written that before, and I had replied to it before. I can't see the point in repeating the same "argument" over and over again.
AND51 wrote:The user who adopts our code can easily override the setting and adjust the code, so there's no need to discuss about this.
And exactly who started this "discussion"?
AND51 wrote:IIRC, a HTTP-header does not nee to be seperated by #CRLF$'s, it can depend on the server software.
IYRC ... Sorry, I am only interested in facts, not in rumours.
AND51 wrote:If there are only #LF$'s for example, your code fails.
Huuuu?
You might want to read my code again. Hint: Look for #LF$.

Regards, Little John

Posted: Mon May 26, 2008 5:54 pm
by AND51
> Imagine an URL "A" which is redirected to URL "B" ...
Yes, okay. As I said, the user can easily override this setting and pass a number to the procedure, so there shouldn't be a problem.

> That's the reason why I let the procedure check the recursion depth
This is not the onliest reason. Image URL A leads to B, that leads to C, that leads to D....... In spite of that, this parameter is a good idea.

> And exactly who started this "discussion"?
You. Your words were: "Infinite depth is unwanted." This sentence lead us to the discussion about the sense of the depth-parameter.

> Sorry, I am only interested in facts, not in rumours
I looked that up, the only allowed new-line character is #CRLF$.
Nevertheless, you misunderstood me. If there were only #LF$'s in the response then your code fails, because it cannot find any #CR$'s. Ergo, it'd fail.

> You might want to read my code again.
Ah, I see. Saying that I'm repeating myself, but you do it, too... :wink:


Thanks for reading. I've nothing to add at the moment.

Posted: Mon May 26, 2008 5:55 pm
by Trond
#CRLF$ is necessary.

Since recursion seems to be the latest fashion:

Code: Select all

InitNetwork()

Declare.s ExtractLocation(URL.s)

Procedure.s Redirect(URL.s)
  Protected Location.s = ExtractLocation(GetHTTPHeader(Url))
  If Location
    ProcedureReturn Location
  EndIf
  ProcedureReturn URL
EndProcedure

Procedure.s ExtractLocation(Header.s)
  Protected Pos = FindString(Header, #CRLF$ + "Location: ", 1)
  If Pos And PokeL(@Pos, Pos+12)
    ProcedureReturn Redirect( Mid(Header, Pos, FindString(Header, #CRLF$, Pos)-Pos) )
  EndIf
EndProcedure

Debug Redirect("http://tinyurl.com/5ev9wt") ; English PB Forum

Posted: Mon May 26, 2008 6:13 pm
by ABBKlaus
AND51 wrote:FreeRegularExpression() is not neccessary. Why?
It is only declared once, because I use a static variable.
There is no memory leak or something like that.
Furthemore, the expression is freed automatically at program end.

Can't see any disadvantages of my code.
Agree´d on that (i did not read the hole code), btw. i like both versions :wink: