URL validation

Share your advanced PureBasic knowledge/code with the community.
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

URL validation

Post by greyhoundcode »

Thought I'd share a snippet I use for validating URLS.

Code: Select all

; Validates URLS
; --------------
; Must include a scheme such as http:// or ftp://
; Support for port numbers and numeric IPs
; 
; Returns bool (#True or #False)
; -----------------------------------------------
Procedure.b ValidURL(url.s)

    regex.i
    pattern.s = "^([a-z0-9]+://)(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-]+@)?(([0-9]{1,3}\.){3}[0-9]{1,3}|([0-9a-z_!~*'()-]+\.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.[a-z]{2,6})(:[0-9]{1,4})?((/?)|(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$"
    
    If CreateRegularExpression(regex, pattern)
    
        If MatchRegularExpression(regex, url)
            
            FreeRegularExpression(regex)
            ProcedureReturn #True
    
        EndIf
        
    EndIf
    
    FreeRegularExpression(regex)
    ProcedureReturn #False

EndProcedure
25/11/10 edited to remove accidental whitespace in the regex
Last edited by greyhoundcode on Thu Nov 25, 2010 10:23 am, edited 1 time in total.
User avatar
JHPJHP
Addict
Addict
Posts: 2258
Joined: Sat Oct 09, 2010 3:47 am

Re: URL validation

Post by JHPJHP »

Very cool - thanks for sharing...

If you're not investing in yourself, you're falling behind.

My PureBasic StuffFREE STUFF, Scripts & Programs.
My PureBasic Forum ➤ Questions, Requests & Comments.
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

Re: URL validation

Post by greyhoundcode »

Pleased to. :)
DarkPlayer
Enthusiast
Enthusiast
Posts: 107
Joined: Thu May 06, 2010 11:36 pm

Re: URL validation

Post by DarkPlayer »

hi

This is a nice and short code, but it does not recognize all urls. I wrote a little code snippet some time ago to convert such exotic urls to a more common format and copied some examples which your code does not accept:

This does not work:
http://test:hehe@80.237.159.41:80
This is just a little error, which can be fixed by removing the space in the following part of yor reg:

Code: Select all

&=+$%-]+: )
This is an hex encoded IP address. If you dont believe that this is valid, click on it and see what your browser does. (IE / Chrome will automaticly convert it into Decimal when opening the page, Firefox will show the hex)
http://0x50ed9f29/blog/

This is also valid:
http://gOoGlE.de

Some time ago this also got valid:
http://www.müller.de/
http://straße.de/

Also a nice example:
[url]http://உதாரணம்.பரிட்சை/[/url]

localhost and any other hostname (not dns name!) is not recognized
http://localhost/

this is forbidden by the most registrars, but defined
[url]http://example_test.test[/url]

A IPv6 address would not be valid either
http://[::1]

A dns toplevel with more than 6 characters is also possible, they are not used on the internet, but can be setup local for an internal network
http://example.myownnetwork

The BB Code Parser does not recognize all of them either :D

DarkPlayer
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

Re: URL validation

Post by greyhoundcode »

Good points :D
User avatar
Joakim Christiansen
Addict
Addict
Posts: 2452
Joined: Wed Dec 22, 2004 4:12 pm
Location: Norway
Contact:

Re: URL validation

Post by Joakim Christiansen »

greyhoundcode wrote:Thought I'd share a snippet I use for validating URLS.
You could also make it do a HTTP request to validate if the URL actually points to a real website.
I like logic, hence I dislike humans but love computers.
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

Re: URL validation

Post by greyhoundcode »

Joakim Christiansen wrote:You could also make it do a HTTP request to validate if the URL actually points to a real website.
That's true, although my intent was basically to avoid making unnecessary requests (the URLs coming from potentially untrusted sources) where a URL is badly formed. But yeah good point.
kvitaliy
Enthusiast
Enthusiast
Posts: 162
Joined: Mon May 10, 2010 4:02 pm

Re: URL validation

Post by kvitaliy »

Test this address
http: // россия.рф/main/page8.htm
Your code with it does not work :D
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

URL validation - regular expressions and non-Latin character

Post by greyhoundcode »

No my code wouldn't work with something like россия.рф, however I'd suggest creating a separate procedure to implement IDNA if this was a concern for an individual application, seeing as non-Latin or accented Latin characters are transliterated back to ASCII anyway (like xn--h1alffa9f.xn--p1ai in the case of россия.рф).

So something like ValidURL( TransformIDNA_URL(url.s) ) maybe. Joakim's suggestion would probably be far easier in this case!

Actually, I have to say I don't know too much about UTF8/UTF16 in regular expressions, don't know if anyone is aware of good tutorials or resources for this? I wonder if POSIX notations like [:upper:] apply irrespective of character set.
Post Reply