Determing if a URL is a web page or file download
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Determing if a URL is a web page or file download
In my app, the user can specify a web page to view by directly typing it into a StringGadget. But it was reported to me today that if the user types a filename, such as "http://www.example.com/App.zip", then my app sends back the file as gibberish text.
So, I need to determine if the typed URL is actually a web page or download. What's the best way to test this? I was just going to check if the extension is HTM, HTML, PHP, and so on; but some URLs don't end in these (like these forums, this thread's URL ends in "viewtopic.php?f=13&t=46967" and not just ".php").
What to do? Thanks.
So, I need to determine if the typed URL is actually a web page or download. What's the best way to test this? I was just going to check if the extension is HTM, HTML, PHP, and so on; but some URLs don't end in these (like these forums, this thread's URL ends in "viewtopic.php?f=13&t=46967" and not just ".php").
What to do? Thanks.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Determing if a URL is a web page or file download
I put this in a wrapper to check lan connectivity.
Code: Select all
InitNetwork()
Define.s URL$
URL$ = "http://www.w3.org/Protocols/HTTP/HTRESP.html"
Debug UCase(StringField(GetHTTPHeader(URL$),1,#CRLF$))
URL$ = "http://www.example.com/App.zip"
Debug UCase(StringField(GetHTTPHeader(URL$),1,#CRLF$))
; Checking a file or folder does not work! --> "file://c:\z.txt"
; Even though it will load in a webgadget.
; Use FileSize() instead.
URL$ = "file://c:\z.txt"
;Debug UCase(StringField(GetHTTPHeader(URL$),1,#CRLF$)) ; <--- This will fail after a long timeout
Debug FileSize(URL$) + 1
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Re: Determing if a URL is a web page or file download
@skywalk: Your code debugs "302 found" for the second file which clearly doesn't exist.
@Machinecode: The simplest way is to check for Chr(0). A textfile won't contain any.
This way you can display .pb etc, too, like Firefox does for example: http://purearea.net/pb/CodeArchiv/Datab ... atabase.pb
@Machinecode: The simplest way is to check for Chr(0). A textfile won't contain any.
This way you can display .pb etc, too, like Firefox does for example: http://purearea.net/pb/CodeArchiv/Datab ... atabase.pb
Re: Determing if a URL is a web page or file download
HTTP/1.1 200 OKTomS wrote:@skywalk: Your code debugs "302 found" for the second file which clearly doesn't exist.
HTTP/1.0 302 FOUND
0
You are correct.

The HTTP response status code '302 Found' is the most common way of performing a redirection.
It is up to you to decide if the new URL is valid.
Code: Select all
InitNetwork()
Define.s URL$
URL$ = "http://www.w3.org/Protocols/HTTP/HTRESP.html"
Debug UCase(StringField(GetHTTPHeader(URL$),1,#CRLF$))
URL$ = "http://www.example.com/App.zip"
Debug UCase(StringField(GetHTTPHeader(URL$),1,#CRLF$)) ; <--- Redirected implies failure
URL$ = "http://www.tedia.eu/download/files/udaq_ftdi_w98.zip"
Debug UCase(StringField(GetHTTPHeader(URL$),1,#CRLF$)) ; <--- This file does exist and returns 200
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Re: Determing if a URL is a web page or file download
You can check the Content-Type of the returned HTTP header, something like this (quick'n'ugy code)
PS. Skywalk, keep in mind that StringField() only works with single-character delimiters (for now...?).
Code: Select all
Procedure.s ContentType(URL.s)
Protected Header.s, Type.s, LocStart.i, Location.s
Debug URL
Repeat
Header = GetHTTPHeader(URL)
LocStart = FindString(Header, "Location: ", 1)
If (LocStart)
URL = Mid(Header, LocStart + Len("Location: "))
URL = StringField(URL, 1, #CR$)
Else
Break
EndIf
ForEver
Type = Mid(Header, FindString(Header, "Content-Type: ", 1) + Len("Content-Type: "))
Type = StringField(Type, 1, #CR$)
Type = LCase(StringField(Type, 1, ";"))
Debug " ---> " + Type
ProcedureReturn Type
EndProcedure
InitNetwork()
ContentType("http://www.purebasic.fr/english/viewtopic.php?f=13&p=356948&sid=b4b1d1bf5a2d4aa082ced0b67fff63ff")
ContentType("http://google.com")
ContentType("http://www.purebasic.fr/english/download/file.php?avatar=5039_1292709976.jpg")
ContentType("http://download.savannah.nongnu.org/releases/tinycc/tcc-0.9.25.tar.bz2")
ContentType("http://download.savannah.nongnu.org/releases/tinycc/tcc-0.9.25-win32-bin.zip?dummyparameter=5")
Debug ""
PS. Skywalk, keep in mind that StringField() only works with single-character delimiters (for now...?).
Code: Select all
; Doesn't work as expected!
Debug StringField("Hello World!", 1, "or")
Re: Determing if a URL is a web page or file download
Or something like this... Much simpler and faster, but it doesn't actually check the file online, just its name. Also it doesn't work if the user just enters a path with no file (see third example).
Code: Select all
Procedure.s URLExtension(URL.s)
ProcedureReturn (LCase(GetExtensionPart(GetURLPart(URL, #PB_URL_Path))))
EndProcedure
Debug URLExtension("http://www.purebasic.fr/english/viewtopic.php?f=13&p=356948&sid=b4b1d1bf5a2d4aa082ced0b67fff63ff")
Debug URLExtension("http://download.savannah.nongnu.org/releases/tinycc/tcc-0.9.25.tar.bz2?yes=no")
Debug URLExtension("http://audiere.sourceforge.net/audiere-1.9.4-users-doxygen/") ; Uh oh
Debug ""
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Determing if a URL is a web page or file download
Ah, that's what I need!kenmo wrote:LCase(GetExtensionPart(GetURLPart(URL, #PB_URL_Path)))

But the next question is: is there an official list of web page extensions somewhere? HTM and HTML I know, and PHP I know, but who are you? I mean, who are the rest?
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Determing if a URL is a web page or file download
The extension is not reliable at all.
Any extension can be routed to be interpreted by php or any other language interpreter.
And a php file can output any data.
www.example.com/image.php can very well display an image and thus your stringgadget contains the same binary "rubbish" as with a normal image.
You could check the MIME type in the http-header and compare it to a list of plaintext mimes. But it's also not reliable as it can be changed by the server/php.
I could load an image in php, and output its data using the mimetype plain text. Every browser will display the binary contents of the image, and so will your programm.
To check if the file does NOT contain any characters 0-31 is the only 100% reliable way.
Any extension can be routed to be interpreted by php or any other language interpreter.
And a php file can output any data.
www.example.com/image.php can very well display an image and thus your stringgadget contains the same binary "rubbish" as with a normal image.
You could check the MIME type in the http-header and compare it to a list of plaintext mimes. But it's also not reliable as it can be changed by the server/php.
I could load an image in php, and output its data using the mimetype plain text. Every browser will display the binary contents of the image, and so will your programm.
To check if the file does NOT contain any characters 0-31 is the only 100% reliable way.
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Determing if a URL is a web page or file download
Okay, thanks, that's the approach I will take.TomS wrote:To check if the file does NOT contain any characters 0-31 is the only 100% reliable way.

Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Determing if a URL is a web page or file download
I agree checking the file extension is not correct, but how is scanning a file for Chr(0) easier than UCase(StringField(GetHTTPHeader(URL$),1,#CRLF$)) 

The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Re: Determing if a URL is a web page or file download
What is this supposed to do?
It doesn't tell you if it's a binary file or not. It just tells if the file is there.
It's of course not easy to check the whole content of the file. Well, it is easy it's just not fast and you have to download the whole file first, but it's the only reliable method for MachineCode's question.
It doesn't tell you if it's a binary file or not. It just tells if the file is there.
It's of course not easy to check the whole content of the file. Well, it is easy it's just not fast and you have to download the whole file first, but it's the only reliable method for MachineCode's question.
Re: Determing if a URL is a web page or file download
True, but a redirect implies a stale URL or some other question as to whether to download the file in the 1st place.
How is that not equally or more important?
How is that not equally or more important?
Re: Determing if a URL is a web page or file download
It's a totally different problem.
Of course it doesn't hurt to check if the file exists before attemping a download
Of course it doesn't hurt to check if the file exists before attemping a download

- RichAlgeni
- Addict
- Posts: 935
- Joined: Wed Sep 22, 2010 1:50 am
- Location: Bradenton, FL
Re: Determing if a URL is a web page or file download
I agree with Kenmo, use the line starting with 'Content-Type:'. The first word will tell you the type of data to expect, after the slash you will get the specifics. This should be on a line by itself in the returning header, according to the RFC.
Re: Determing if a URL is a web page or file download
Sigh. It's NOT reliable.RichAlgeni wrote:I agree with Kenmo, use the line starting with 'Content-Type:'. The first word will tell you the type of data to expect, after the slash you will get the specifics. This should be on a line by itself in the returning header, according to the RFC.
Code: Select all
<?php //An image with content type: Text
$my_img = imagecreate( 200, 80 );
$background = imagecolorallocate( $my_img, 0, 0, 255 );
$text_colour = imagecolorallocate( $my_img, 255, 255, 0 );
$line_colour = imagecolorallocate( $my_img, 128, 255, 0 );
imagestring( $my_img, 4, 30, 25, "This is an image",
$text_colour );
imagesetthickness ( $my_img, 5 );
imageline( $my_img, 30, 45, 165, 45, $line_colour );
header( "Content-type: text" );
imagepng( $my_img );
imagecolordeallocate( $line_color );
imagecolordeallocate( $text_color );
imagecolordeallocate( $background );
imagedestroy( $my_img );
?>
Code: Select all
Debug GetHTTPHeader("http://purebasicusermap.bplaced.de/contenttype/image_contenttype_text.php")
Debug ReceiveHTTPString("http://purebasicusermap.bplaced.de/contenttype/image_contenttype_text.php")
Code: Select all
<?php //A text-output with content-type: Image/PNG
header( "Content-type: image/png" );
echo("Hello World");
?>
Code: Select all
Debug GetHTTPHeader("http://purebasicusermap.bplaced.de/contenttype/text_contenttype_image.php")
Debug ReceiveHTTPString("http://purebasicusermap.bplaced.de/contenttype/text_contenttype_image.php")
ReceiveHTTPString:
Code: Select all
Procedure.s ReceiveHTTPString(URL$, TimeOut=5000)
Protected Event, Time, Size, String$, Inhalt
Protected BufferSize = $1000, *Buffer = AllocateMemory(BufferSize)
Protected ServerName$ = GetURLPart(URL$, #PB_URL_Site)
Protected ConnectionID = OpenNetworkConnection(ServerName$, 80)
If ConnectionID
SendNetworkString(ConnectionID, "GET "+URL$+" HTTP/1.0"+#LFCR$+#LFCR$)
Time = ElapsedMilliseconds()
Repeat
Delay(10)
Event = NetworkClientEvent(ConnectionID)
If Event = #PB_NetworkEvent_Data
Repeat
Size = ReceiveNetworkData(ConnectionID, *Buffer, BufferSize)
String$ + PeekS(*Buffer, Size, #PB_Ascii)
Until Not Size
Inhalt = FindString(String$, #LFCR$, 1)
If Inhalt
String$ = Mid(String$, Inhalt+3)
EndIf
EndIf
Until ElapsedMilliseconds()-Time > TimeOut Or String$
CloseNetworkConnection(ConnectionID)
EndIf
FreeMemory(*Buffer)
ProcedureReturn String$
EndProcedure