PureBasic Forums - English

Posted: **Tue Nov 01, 2011 12:36 pm**

TomS wrote:Sigh. It's NOT reliable.

Just revisiting this now, as my app nears completion... so there's no reliable way other than downloading the target locally first, and then checking if it's a web page versus a binary file?

Posted: **Wed Nov 02, 2011 1:16 am**

MachineCode wrote:Just revisiting this now, as my app nears completion... so there's no reliable way other than downloading the target locally first, and then checking if it's a web page versus a binary file?

Well it looks like you have three options...

1. Check the URL -- not reliable.
2. Check the HTTP content type -- usually reliable, but can be intentionally/unintentionally incorrect.
3. Check the file itself -- always works, but requires a full download and parse of the file.

In theory, if all you know about a file is its name/extension/size, you CAN'T determine if it's text or binary, unless you look inside it.

But it's sometimes unreasonable to download and scan every file. Depending on your application, option 2 might be fine.

(Also, theoretically, a "binary" file output by any other program might contain all valid ASCII characters, just by chance... Then option 3 would not consider it a "binary" file, unless the content-type is also checked! It's not likely, but possible. Depending on what your program does with them, it may not really matter.)

Posted: **Wed Nov 02, 2011 8:20 am**

It's all too hard. I'm just going to do my own option:

4. Obey the user.

If they submit a binary download instead of a web page URL, then too bad.

Posted: **Wed Nov 02, 2011 10:28 pm**

Do you expect and/or intend to let your users view resources such as images (by themselves), or rather pages of HTML? If the latter, perhaps some crude testing by regex or even a more complex solution such as attempting to load and inspect the DOM would suffice?

Posted: **Wed Nov 02, 2011 11:07 pm**

ContentType is the correct way. If the server sends wrong information, then that's a problem with the server and not your program.

Posted: **Thu Nov 03, 2011 1:16 am**

couldn't you download the first X bytes and check them? I always thought the first few characters or any page would be "<HTML>" or the like

Posted: **Thu Nov 03, 2011 9:37 am**

greyhoundcode wrote:Do you expect and/or intend to let your users view resources such as images (by themselves), or rather pages of HTML?

They can enter any URL, just like a web browser's address bar. But like a web browser, if you enter an URL to a 10 GB file, then it's going to be a long time before it downloads. That's what I was hoping to prevent.

@Trond: I'll look more into ContentType then. Thanks.

Posted: **Thu Nov 03, 2011 3:13 pm**

MIME header...done

Posted: **Sun Nov 13, 2011 7:15 am**

After extensive testing, Kenmo's ContentType() procedure is the most reliable, but it still fails to return a content type for Wikipedia's "Random" link, which is http://en.wikipedia.org/wiki/Special:Random . I'm sure I can work around that, though. Thanks Kenmo!

Posted: **Sun Nov 13, 2011 11:52 am**

MachineCode wrote:After extensive testing, Kenmo's ContentType() procedure is the most reliable, but it still fails to return a content type for Wikipedia's "Random" link, which is http://en.wikipedia.org/wiki/Special:Random . I'm sure I can work around that, though. Thanks Kenmo!

GetHTTPHeader() returns an empty string for this URL. I think it's a bug in this function.

PureBasic Forums - English

Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download

Re: Determing if a URL is a web page or file download