Just revisiting this now, as my app nears completion... so there's no reliable way other than downloading the target locally first, and then checking if it's a web page versus a binary file?TomS wrote:Sigh. It's NOT reliable.
Determing if a URL is a web page or file download
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Determing if a URL is a web page or file download
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Determing if a URL is a web page or file download
Well it looks like you have three options...MachineCode wrote:Just revisiting this now, as my app nears completion... so there's no reliable way other than downloading the target locally first, and then checking if it's a web page versus a binary file?
1. Check the URL -- not reliable.
2. Check the HTTP content type -- usually reliable, but can be intentionally/unintentionally incorrect.
3. Check the file itself -- always works, but requires a full download and parse of the file.
In theory, if all you know about a file is its name/extension/size, you CAN'T determine if it's text or binary, unless you look inside it.
But it's sometimes unreasonable to download and scan every file. Depending on your application, option 2 might be fine.
(Also, theoretically, a "binary" file output by any other program might contain all valid ASCII characters, just by chance... Then option 3 would not consider it a "binary" file, unless the content-type is also checked! It's not likely, but possible. Depending on what your program does with them, it may not really matter.)
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Determing if a URL is a web page or file download
It's all too hard. I'm just going to do my own option:
4. Obey the user.
If they submit a binary download instead of a web page URL, then too bad.
4. Obey the user.
If they submit a binary download instead of a web page URL, then too bad.

Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
- greyhoundcode
- Enthusiast
- Posts: 112
- Joined: Sun Dec 30, 2007 7:24 pm
Re: Determing if a URL is a web page or file download
Do you expect and/or intend to let your users view resources such as images (by themselves), or rather pages of HTML? If the latter, perhaps some crude testing by regex or even a more complex solution such as attempting to load and inspect the DOM would suffice?
Re: Determing if a URL is a web page or file download
ContentType is the correct way. If the server sends wrong information, then that's a problem with the server and not your program.
Re: Determing if a URL is a web page or file download
couldn't you download the first X bytes and check them? I always thought the first few characters or any page would be "<HTML>" or the like
there is no sig, only zuul (and the following disclaimer)
WARNING: may be talking out of his hat
WARNING: may be talking out of his hat
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Determing if a URL is a web page or file download
They can enter any URL, just like a web browser's address bar. But like a web browser, if you enter an URL to a 10 GB file, then it's going to be a long time before it downloads. That's what I was hoping to prevent.greyhoundcode wrote:Do you expect and/or intend to let your users view resources such as images (by themselves), or rather pages of HTML?
@Trond: I'll look more into ContentType then. Thanks.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
- ultralazor
- Enthusiast
- Posts: 186
- Joined: Sun Jun 27, 2010 9:00 am
Re: Determing if a URL is a web page or file download
MIME header...done
so many ideas so little time..
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Determing if a URL is a web page or file download
After extensive testing, Kenmo's ContentType() procedure is the most reliable, but it still fails to return a content type for Wikipedia's "Random" link, which is http://en.wikipedia.org/wiki/Special:Random . I'm sure I can work around that, though. Thanks Kenmo! 

Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Determing if a URL is a web page or file download
GetHTTPHeader() returns an empty string for this URL. I think it's a bug in this function.MachineCode wrote:After extensive testing, Kenmo's ContentType() procedure is the most reliable, but it still fails to return a content type for Wikipedia's "Random" link, which is http://en.wikipedia.org/wiki/Special:Random . I'm sure I can work around that, though. Thanks Kenmo!