Page 1 of 1
URL validation
Posted: Fri Nov 19, 2010 11:29 am
by greyhoundcode
Thought I'd share a snippet I use for validating URLS.
Code: Select all
; Validates URLS
; --------------
; Must include a scheme such as http:// or ftp://
; Support for port numbers and numeric IPs
;
; Returns bool (#True or #False)
; -----------------------------------------------
Procedure.b ValidURL(url.s)
regex.i
pattern.s = "^([a-z0-9]+://)(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-]+@)?(([0-9]{1,3}\.){3}[0-9]{1,3}|([0-9a-z_!~*'()-]+\.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.[a-z]{2,6})(:[0-9]{1,4})?((/?)|(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$"
If CreateRegularExpression(regex, pattern)
If MatchRegularExpression(regex, url)
FreeRegularExpression(regex)
ProcedureReturn #True
EndIf
EndIf
FreeRegularExpression(regex)
ProcedureReturn #False
EndProcedure
25/11/10 edited to remove accidental whitespace in the regex
Re: URL validation
Posted: Fri Nov 19, 2010 5:07 pm
by JHPJHP
Very cool - thanks for sharing...
Re: URL validation
Posted: Fri Nov 19, 2010 10:30 pm
by greyhoundcode
Pleased to.

Re: URL validation
Posted: Sun Nov 21, 2010 12:38 am
by DarkPlayer
hi
This is a nice and short code, but it does not recognize all urls. I wrote a little code snippet some time ago to convert such exotic urls to a more common format and copied some examples which your code does not accept:
This does not work:
http://test:hehe@80.237.159.41:80
This is just a little error, which can be fixed by removing the space in the following part of yor reg:
This is an hex encoded IP address. If you dont believe that this is valid, click on it and see what your browser does. (IE / Chrome will automaticly convert it into Decimal when opening the page, Firefox will show the hex)
http://0x50ed9f29/blog/
This is also valid:
http://gOoGlE.de
Some time ago this also got valid:
http://www.müller.de/
http://straße.de/
Also a nice example:
[url]http://உதாரணம்.பரிட்சை/[/url]
localhost and any other hostname (not dns name!) is not recognized
http://localhost/
this is forbidden by the most registrars, but defined
[url]http://example_test.test[/url]
A IPv6 address would not be valid either
http://[::1]
A dns toplevel with more than 6 characters is also possible, they are not used on the internet, but can be setup local for an internal network
http://example.myownnetwork
The BB Code Parser does not recognize all of them either
DarkPlayer
Re: URL validation
Posted: Sun Nov 21, 2010 4:24 pm
by greyhoundcode
Good points

Re: URL validation
Posted: Sun Nov 21, 2010 9:08 pm
by Joakim Christiansen
greyhoundcode wrote:Thought I'd share a snippet I use for validating URLS.
You could also make it do a HTTP request to validate if the URL actually points to a real website.
Re: URL validation
Posted: Sun Nov 21, 2010 10:09 pm
by greyhoundcode
Joakim Christiansen wrote:You could also make it do a HTTP request to validate if the URL actually points to a real website.
That's true, although my intent was basically to avoid making unnecessary requests (the URLs coming from potentially untrusted sources) where a URL is badly formed. But yeah good point.
Re: URL validation
Posted: Mon Nov 22, 2010 5:38 am
by kvitaliy
Test this address
http: // россия.рф/main/page8.htm
Your code with it does not work

URL validation - regular expressions and non-Latin character
Posted: Mon Nov 22, 2010 1:29 pm
by greyhoundcode
No my code wouldn't work with something like россия.рф, however I'd suggest creating a separate procedure to implement IDNA if this was a concern for an individual application, seeing as non-Latin or accented Latin characters are transliterated back to ASCII anyway (like xn--h1alffa9f.xn--p1ai in the case of россия.рф).
So something like ValidURL( TransformIDNA_URL(url.s) ) maybe. Joakim's suggestion would probably be far easier in this case!
Actually, I have to say I don't know too much about UTF8/UTF16 in regular expressions, don't know if anyone is aware of good tutorials or resources for this? I wonder if POSIX notations like [:upper:] apply irrespective of character set.