Page 2 of 4

Posted: Mon May 26, 2008 6:16 pm
by Little John
AND51 wrote:> And exactly who started this "discussion"?
You.
Ohhh ...
AND51 wrote: If there were only #LF$'s in the response then your code fails, because it cannot find any #CR$'s. Ergo, it'd fail.
Sigh ....
Please get your facts first.

Little John

Posted: Mon May 26, 2008 6:27 pm
by Trond
In case people don't want to read the thread and just needs a function like this: There is only one sane and working code so far, and that's the one in the first post.
FreeRegularExpression() is not neccessary. Why?
It is only declared once, because I use a static variable.
And that is why it's not threadsafe. So if it's called multiple times it'd fail in rare cases.

Posted: Mon May 26, 2008 7:42 pm
by AND51
So? Can you give me an example?
Before I proof the opposite case, I want to hear your story.

Posted: Mon May 26, 2008 8:01 pm
by Trond
You're right, it doesn't crash. But, it could potentially create any number (as many as you have concurrent threads) of regular expressions and lose track of all but one of them, which I personally consider bad.

Posted: Mon May 26, 2008 8:40 pm
by AND51
Yes, I'm right; it doesn't crash. I had 5 identical threads with ForEver-loops running for ca. 5 minutes and nothing happened.

> it could potentially create any number (as many as you have concurrent threads) of regular expressions and lose track of all but one of them
No. I had 5 threads running on my 3.4 GHz Dual Core and it never created the expression more than 2 times. in very most cases it was created once only. No matter how often/fast I call the procedure in the 5 threads. hey, this case is very rare and this expression only uses <500 byte memory.

> Please get your facts first.
Errare humanum est. :roll: Nobody is perfect. :wink:

Posted: Tue May 27, 2008 12:47 am
by pdwyer
AND51 wrote:> AND51 showed us, that they are unnecessary

Hey come on, guy!
The command is unnecessary in this case! You know very well, what I mean. In this case it's ok to leave out the Free-Command, because the expression is only allocated once. [...]
This is not an act of lazyness, but an act of performance optimizing, because the allocation is only proceeded once.

[...]

// Edit:
Oh, thank your Fred! BTW, I've shortend this post.
I don't understand how you can say "in this case" :?:
This is a function, surely if I'm creating some sort of website rip program that calls this thousands of times this is going to be a leak isn't it?

Could you explain a little more the logic of leaving it off

How could you know how it's intended to be used?

Posted: Tue May 27, 2008 1:04 pm
by Trond
AND51 wrote: > it could potentially create any number (as many as you have concurrent threads) of regular expressions and lose track of all but one of them
No. I had 5 threads running on my 3.4 GHz Dual Core and it never created the expression more than 2 times. in very most cases it was created once only. No matter how often/fast I call the procedure in the 5 threads. hey, this case is very rare and this expression only uses <500 byte memory.
Just because it doesn't happen to you doesn't mean it can't happen. Theoretically it could happen.

Let's say you spawn 10000 threads. All threads could enter the if clause. Once they're in it, they create a regular expression each. It doesn't matter how long you run it, as this can only happen on the first calls.

Of course, it's not a big problem, but I just don't think you should post a code with a resource leak and claim it's "better". It's more the attitute "mine leaks resources and doesn't follow the correct standard, so it's better" that I don't like.

Posted: Tue May 27, 2008 4:47 pm
by AND51
pdwyer wrote:I don't understand how you can say "in this case" :?:
[...] this is going to be a leak isn't it?
Could you explain a little more the logic of leaving it off
No this is not a memory leak. Of course, I want to share my PB-knowledge with you, so I explain it to you. :)

If you call the procedure for the first time, then the If checks the value of the variable 'location'. location is 0, so the If is true. Now, inside the If, the variable location is filled with a number. This number is the ID of the RefularExpression.

The point is, that 'location' is Static and not Protected. You could also use a Global variable instead. The advantage of the static variable is, that it even keeps it value when the procedure is over.

So, what happens, when you call the procedure for the second time? When you call the procedure, the If at the beginning checks, if 'location' has got a value. Yes, it has (from the last time you called the procedure). But the If does not trigger, because there is a Not. In fact, you could also write If location <> 0 which is the same.

So the If does only fire once - consequence: The expression is only created once. That's why there is no memory leak. This method offers 2 advantages:
1. You don't have to create the regular expression every time; this saves time.
2. You don't have to free it at the end of your procedure. Why? Because PureBasic automatically frees everything at program exit, as Fred said.


@ Trond
> post a code with a resource leak and claim it's "better"
I don't say my code is better, because it "offers" a leak. I claim the "better-status", because it's faster (less code), simpler (recursion instead of loop) and more comfortable (integrated depth-param instead of a constant).

> Let's say you spawn 10000 threads
The limit is 2.000 :wink: Furthemore, if you want to create 2.000 threads, then I ask you: Why? Creating 2.000 threads - THAT is bad coding.
[sentence removed]
If you want to use so many threads, please feel free to use this code instead:

Code: Select all

Procedure.s traceURL(URL$, RecursionDepth=#INFINITE)
	If RecursionDepth
		Protected location.l=CreateRegularExpression(#PB_Any, "(?im)^Location: .*$")
		Protected Dim subPattern$(0)
		If ExtractRegularExpression(location, GetHTTPHeader(URL$), subPattern$())
			URL$=traceURL(Mid(subPattern$(0), 11), RecursionDepth-1)
		EndIf
		FreeRegularExpression(location)
	EndIf
	ProcedureReturn URL$
EndProcedure

Posted: Tue May 27, 2008 5:59 pm
by Trond
@ Trond
> post a code with a resource leak and claim it's "better"
I don't say my code is better, because it "offers" a leak.
Actually, that's exactly what you said. You said:
It's better, because it's shorter and faster. Also, you said that it's shorter and faster because you don't free the regular expression.
I claim the "better-status", because it's faster (less code), simpler (recursion instead of loop) and more comfortable (integrated depth-param instead of a constant).
It's actually not so much faster. But if you only judge speed and number of lines, then it's better. But there are other judging criteria, where your code is worse:
- Potential resource leak with threads
- Makes the executable 105 kb bigger on Linux compared to the original code. That's quite a bloat, and will defintely not make the program startup any faster.

I think your code is OK, but it is not better than the original code, and is in fact a lot worse in some areas. Skidding in and shouting "it's better" is bad taste. Especially accompanied by false statements like #LF$ is a legal separator and you can only create 2000 threads.

Posted: Tue May 27, 2008 7:16 pm
by Little John
Trond, you are expressing exactly what I am thinking. Thanks for your support.

What AND51 is doing here looks to me like "premature optimization". I personally prefer to write reliable code, which is also good readable, and easily maintainable. My code is about 572 times ;) better readable than AND51's.

My code probably can be made faster. I like the approach with FindString(Header, #CRLF$ + "Location: ", 1) that you posted earlier.
There will be a problem in the rare (but not impossible, I think) case when "Location" is at the very beginning of the header, so that it is not preceeded by #CRLF$. So for safety's sake, I personally would prefer something like

Code: Select all

FindString(#CRLF$ + Header, #CRLF$ + "Location: ", 1)
Unfortunately I am very busy these days with other things. (I shouldn't even "waste time" by writing this right now.)

Regards, Little John

Posted: Wed May 28, 2008 1:41 am
by pdwyer
Actually I think I'm understanding this now.

I'm not using any of the beta's so I haven't looked at how regular expressions work in PB but I guess the logic is that since the expression never changes for this procedure, even if proc is called by lots of threads there is only one expression created and (ExtractRegularExpression is assumably threadsafe under the hood with the same expression ID) reused.

So there won't be a leak in a sense of an increasing memory drain, just a one time usage. It's not freed deliberately in the proc as it assumes it WILL be reused continually in the same process.

I didn't notice the static var before, nor it's significance

Posted: Wed May 28, 2008 6:11 am
by Little John
Works also with PB 5.20

Here is a new version, based on Trond's idea of applying FindString() to the whole header, without splitting it into individual lines before. Thank you! This is probably faster than the original code, but I didn't actually compare the speed.

//edit 2010-03-10:
Removed trailing .l from variables, so that now the default integer type is used instead.

Code: Select all

; successfully tested on PureBasic 4.20 final

EnableExplicit

Procedure.s RealURL (url.s, recursionDepth=10)
   ; Very useful, because ReceiveHTTPFile() does
   ; not handle redirected URLs.
   ; in : URL
   ; out: - if not redirected: original address
   ;      - if     redirected: target address
   Protected header.s, f, g

   header = GetHTTPHeader(url)

   f = FindString(#CRLF$ + header, #CRLF$ + "Location:", 1)
   If f = 0
      ProcedureReturn url             ; no redirection
   EndIf

   If recursionDepth <= 0             ; protection against endless loops
      ProcedureReturn ""              ; error
   EndIf

   f + 9                              ; right behind ':'
   g = FindString(header, #CRLF$, f)
   If g = 0
      g = Len(header) + 1
   EndIf
   ProcedureReturn RealURL(Trim(Mid(header, f, g-f)), recursionDepth-1)
EndProcedure


;-- Demo
InitNetwork()
Debug RealURL("http://forum.purebasic.com/")
Regards, Little John

Posted: Wed May 28, 2008 7:08 am
by pdwyer
What is the performance of PB's new regular expression like? In my experience these kinds of libraries are generally quite slow for simple matching as they are desgined to simplify complex matching. (and this doesn't seem to be that complex)

QuickSearch() would be faster than FindString() in this case I think :P

Posted: Wed May 28, 2008 8:49 am
by Little John
pdwyer wrote:What is the performance of PB's new regular expression like? In my experience these kinds of libraries are generally quite slow for simple matching as they are desgined to simplify complex matching. (and this doesn't seem to be that complex)
It's also my understanding, that using Regular Expressions is overkill here.
pdwyer wrote:QuickSearch() would be faster than FindString() in this case I think :P
What is QuickSearch()? A secret weapon? ;)

Regards, Little john

Posted: Wed May 28, 2008 1:58 pm
by pdwyer