Determing if a URL is a web page or file download

Just starting out? Need help? Post your questions and find answers here.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Determing if a URL is a web page or file download

Post by MachineCode »

TomS wrote:Sigh. It's NOT reliable.
Just revisiting this now, as my app nears completion... so there's no reliable way other than downloading the target locally first, and then checking if it's a web page versus a binary file?
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
kenmo
Addict
Addict
Posts: 2047
Joined: Tue Dec 23, 2003 3:54 am

Re: Determing if a URL is a web page or file download

Post by kenmo »

MachineCode wrote:Just revisiting this now, as my app nears completion... so there's no reliable way other than downloading the target locally first, and then checking if it's a web page versus a binary file?
Well it looks like you have three options...

1. Check the URL -- not reliable.
2. Check the HTTP content type -- usually reliable, but can be intentionally/unintentionally incorrect.
3. Check the file itself -- always works, but requires a full download and parse of the file.

In theory, if all you know about a file is its name/extension/size, you CAN'T determine if it's text or binary, unless you look inside it.

But it's sometimes unreasonable to download and scan every file. Depending on your application, option 2 might be fine.

(Also, theoretically, a "binary" file output by any other program might contain all valid ASCII characters, just by chance... Then option 3 would not consider it a "binary" file, unless the content-type is also checked! It's not likely, but possible. Depending on what your program does with them, it may not really matter.)
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Determing if a URL is a web page or file download

Post by MachineCode »

It's all too hard. I'm just going to do my own option:

4. Obey the user.

If they submit a binary download instead of a web page URL, then too bad. :P
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
greyhoundcode
Enthusiast
Enthusiast
Posts: 112
Joined: Sun Dec 30, 2007 7:24 pm

Re: Determing if a URL is a web page or file download

Post by greyhoundcode »

Do you expect and/or intend to let your users view resources such as images (by themselves), or rather pages of HTML? If the latter, perhaps some crude testing by regex or even a more complex solution such as attempting to load and inspect the DOM would suffice?
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Determing if a URL is a web page or file download

Post by Trond »

ContentType is the correct way. If the server sends wrong information, then that's a problem with the server and not your program.
citystate
Enthusiast
Enthusiast
Posts: 638
Joined: Sun Feb 12, 2006 10:06 pm

Re: Determing if a URL is a web page or file download

Post by citystate »

couldn't you download the first X bytes and check them? I always thought the first few characters or any page would be "<HTML>" or the like
there is no sig, only zuul (and the following disclaimer)

WARNING: may be talking out of his hat
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Determing if a URL is a web page or file download

Post by MachineCode »

greyhoundcode wrote:Do you expect and/or intend to let your users view resources such as images (by themselves), or rather pages of HTML?
They can enter any URL, just like a web browser's address bar. But like a web browser, if you enter an URL to a 10 GB file, then it's going to be a long time before it downloads. That's what I was hoping to prevent.

@Trond: I'll look more into ContentType then. Thanks.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
ultralazor
Enthusiast
Enthusiast
Posts: 186
Joined: Sun Jun 27, 2010 9:00 am

Re: Determing if a URL is a web page or file download

Post by ultralazor »

MIME header...done
so many ideas so little time..
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Determing if a URL is a web page or file download

Post by MachineCode »

After extensive testing, Kenmo's ContentType() procedure is the most reliable, but it still fails to return a content type for Wikipedia's "Random" link, which is http://en.wikipedia.org/wiki/Special:Random . I'm sure I can work around that, though. Thanks Kenmo! :)
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Determing if a URL is a web page or file download

Post by Trond »

MachineCode wrote:After extensive testing, Kenmo's ContentType() procedure is the most reliable, but it still fails to return a content type for Wikipedia's "Random" link, which is http://en.wikipedia.org/wiki/Special:Random . I'm sure I can work around that, though. Thanks Kenmo! :)
GetHTTPHeader() returns an empty string for this URL. I think it's a bug in this function.
Post Reply