Redirected URLs

Share your advanced PureBasic knowledge/code with the community.
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Post by Little John »

AND51 wrote:> And exactly who started this "discussion"?
You.
Ohhh ...
AND51 wrote: If there were only #LF$'s in the response then your code fails, because it cannot find any #CR$'s. Ergo, it'd fail.
Sigh ....
Please get your facts first.

Little John
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

In case people don't want to read the thread and just needs a function like this: There is only one sane and working code so far, and that's the one in the first post.
FreeRegularExpression() is not neccessary. Why?
It is only declared once, because I use a static variable.
And that is why it's not threadsafe. So if it's called multiple times it'd fail in rare cases.
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

So? Can you give me an example?
Before I proof the opposite case, I want to hear your story.
PB 4.30

Code: Select all

onErrorGoto(?Fred)
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

You're right, it doesn't crash. But, it could potentially create any number (as many as you have concurrent threads) of regular expressions and lose track of all but one of them, which I personally consider bad.
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

Yes, I'm right; it doesn't crash. I had 5 identical threads with ForEver-loops running for ca. 5 minutes and nothing happened.

> it could potentially create any number (as many as you have concurrent threads) of regular expressions and lose track of all but one of them
No. I had 5 threads running on my 3.4 GHz Dual Core and it never created the expression more than 2 times. in very most cases it was created once only. No matter how often/fast I call the procedure in the 5 threads. hey, this case is very rare and this expression only uses <500 byte memory.

> Please get your facts first.
Errare humanum est. :roll: Nobody is perfect. :wink:
PB 4.30

Code: Select all

onErrorGoto(?Fred)
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

AND51 wrote:> AND51 showed us, that they are unnecessary

Hey come on, guy!
The command is unnecessary in this case! You know very well, what I mean. In this case it's ok to leave out the Free-Command, because the expression is only allocated once. [...]
This is not an act of lazyness, but an act of performance optimizing, because the allocation is only proceeded once.

[...]

// Edit:
Oh, thank your Fred! BTW, I've shortend this post.
I don't understand how you can say "in this case" :?:
This is a function, surely if I'm creating some sort of website rip program that calls this thousands of times this is going to be a leak isn't it?

Could you explain a little more the logic of leaving it off

How could you know how it's intended to be used?
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

AND51 wrote: > it could potentially create any number (as many as you have concurrent threads) of regular expressions and lose track of all but one of them
No. I had 5 threads running on my 3.4 GHz Dual Core and it never created the expression more than 2 times. in very most cases it was created once only. No matter how often/fast I call the procedure in the 5 threads. hey, this case is very rare and this expression only uses <500 byte memory.
Just because it doesn't happen to you doesn't mean it can't happen. Theoretically it could happen.

Let's say you spawn 10000 threads. All threads could enter the if clause. Once they're in it, they create a regular expression each. It doesn't matter how long you run it, as this can only happen on the first calls.

Of course, it's not a big problem, but I just don't think you should post a code with a resource leak and claim it's "better". It's more the attitute "mine leaks resources and doesn't follow the correct standard, so it's better" that I don't like.
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

pdwyer wrote:I don't understand how you can say "in this case" :?:
[...] this is going to be a leak isn't it?
Could you explain a little more the logic of leaving it off
No this is not a memory leak. Of course, I want to share my PB-knowledge with you, so I explain it to you. :)

If you call the procedure for the first time, then the If checks the value of the variable 'location'. location is 0, so the If is true. Now, inside the If, the variable location is filled with a number. This number is the ID of the RefularExpression.

The point is, that 'location' is Static and not Protected. You could also use a Global variable instead. The advantage of the static variable is, that it even keeps it value when the procedure is over.

So, what happens, when you call the procedure for the second time? When you call the procedure, the If at the beginning checks, if 'location' has got a value. Yes, it has (from the last time you called the procedure). But the If does not trigger, because there is a Not. In fact, you could also write If location <> 0 which is the same.

So the If does only fire once - consequence: The expression is only created once. That's why there is no memory leak. This method offers 2 advantages:
1. You don't have to create the regular expression every time; this saves time.
2. You don't have to free it at the end of your procedure. Why? Because PureBasic automatically frees everything at program exit, as Fred said.


@ Trond
> post a code with a resource leak and claim it's "better"
I don't say my code is better, because it "offers" a leak. I claim the "better-status", because it's faster (less code), simpler (recursion instead of loop) and more comfortable (integrated depth-param instead of a constant).

> Let's say you spawn 10000 threads
The limit is 2.000 :wink: Furthemore, if you want to create 2.000 threads, then I ask you: Why? Creating 2.000 threads - THAT is bad coding.
[sentence removed]
If you want to use so many threads, please feel free to use this code instead:

Code: Select all

Procedure.s traceURL(URL$, RecursionDepth=#INFINITE)
	If RecursionDepth
		Protected location.l=CreateRegularExpression(#PB_Any, "(?im)^Location: .*$")
		Protected Dim subPattern$(0)
		If ExtractRegularExpression(location, GetHTTPHeader(URL$), subPattern$())
			URL$=traceURL(Mid(subPattern$(0), 11), RecursionDepth-1)
		EndIf
		FreeRegularExpression(location)
	EndIf
	ProcedureReturn URL$
EndProcedure
PB 4.30

Code: Select all

onErrorGoto(?Fred)
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

@ Trond
> post a code with a resource leak and claim it's "better"
I don't say my code is better, because it "offers" a leak.
Actually, that's exactly what you said. You said:
It's better, because it's shorter and faster. Also, you said that it's shorter and faster because you don't free the regular expression.
I claim the "better-status", because it's faster (less code), simpler (recursion instead of loop) and more comfortable (integrated depth-param instead of a constant).
It's actually not so much faster. But if you only judge speed and number of lines, then it's better. But there are other judging criteria, where your code is worse:
- Potential resource leak with threads
- Makes the executable 105 kb bigger on Linux compared to the original code. That's quite a bloat, and will defintely not make the program startup any faster.

I think your code is OK, but it is not better than the original code, and is in fact a lot worse in some areas. Skidding in and shouting "it's better" is bad taste. Especially accompanied by false statements like #LF$ is a legal separator and you can only create 2000 threads.
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Post by Little John »

Trond, you are expressing exactly what I am thinking. Thanks for your support.

What AND51 is doing here looks to me like "premature optimization". I personally prefer to write reliable code, which is also good readable, and easily maintainable. My code is about 572 times ;) better readable than AND51's.

My code probably can be made faster. I like the approach with FindString(Header, #CRLF$ + "Location: ", 1) that you posted earlier.
There will be a problem in the rare (but not impossible, I think) case when "Location" is at the very beginning of the header, so that it is not preceeded by #CRLF$. So for safety's sake, I personally would prefer something like

Code: Select all

FindString(#CRLF$ + Header, #CRLF$ + "Location: ", 1)
Unfortunately I am very busy these days with other things. (I shouldn't even "waste time" by writing this right now.)

Regards, Little John
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Actually I think I'm understanding this now.

I'm not using any of the beta's so I haven't looked at how regular expressions work in PB but I guess the logic is that since the expression never changes for this procedure, even if proc is called by lots of threads there is only one expression created and (ExtractRegularExpression is assumably threadsafe under the hood with the same expression ID) reused.

So there won't be a leak in a sense of an increasing memory drain, just a one time usage. It's not freed deliberately in the proc as it assumes it WILL be reused continually in the same process.

I didn't notice the static var before, nor it's significance
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Post by Little John »

Works also with PB 5.20

Here is a new version, based on Trond's idea of applying FindString() to the whole header, without splitting it into individual lines before. Thank you! This is probably faster than the original code, but I didn't actually compare the speed.

//edit 2010-03-10:
Removed trailing .l from variables, so that now the default integer type is used instead.

Code: Select all

; successfully tested on PureBasic 4.20 final

EnableExplicit

Procedure.s RealURL (url.s, recursionDepth=10)
   ; Very useful, because ReceiveHTTPFile() does
   ; not handle redirected URLs.
   ; in : URL
   ; out: - if not redirected: original address
   ;      - if     redirected: target address
   Protected header.s, f, g

   header = GetHTTPHeader(url)

   f = FindString(#CRLF$ + header, #CRLF$ + "Location:", 1)
   If f = 0
      ProcedureReturn url             ; no redirection
   EndIf

   If recursionDepth <= 0             ; protection against endless loops
      ProcedureReturn ""              ; error
   EndIf

   f + 9                              ; right behind ':'
   g = FindString(header, #CRLF$, f)
   If g = 0
      g = Len(header) + 1
   EndIf
   ProcedureReturn RealURL(Trim(Mid(header, f, g-f)), recursionDepth-1)
EndProcedure


;-- Demo
InitNetwork()
Debug RealURL("http://forum.purebasic.com/")
Regards, Little John
Last edited by Little John on Sun Aug 18, 2013 8:36 pm, edited 3 times in total.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

What is the performance of PB's new regular expression like? In my experience these kinds of libraries are generally quite slow for simple matching as they are desgined to simplify complex matching. (and this doesn't seem to be that complex)

QuickSearch() would be faster than FindString() in this case I think :P
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Post by Little John »

pdwyer wrote:What is the performance of PB's new regular expression like? In my experience these kinds of libraries are generally quite slow for simple matching as they are desgined to simplify complex matching. (and this doesn't seem to be that complex)
It's also my understanding, that using Regular Expressions is overkill here.
pdwyer wrote:QuickSearch() would be faster than FindString() in this case I think :P
What is QuickSearch()? A secret weapon? ;)

Regards, Little john
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Post Reply