Retrieve header info from Webgadget

Just starting out? Need help? Post your questions and find answers here.
elevy
New User
New User
Posts: 4
Joined: Sun Oct 31, 2010 2:12 am

Retrieve header info from Webgadget

Post by elevy »

This is probably easier than I'm making it out to be, but it's been killing me for a while. All I'm trying to do is get the header information (specifically the 'Content-Type') from a web page that's loaded into a webgadget. I've been battling it all day long and it's driving me nuts!!!

I'm trying to do it using COMatePLUS, as I figured that would make it easy to access, but I don't know what the "command string" is supposed to be to get to it.

Can anybody help me? Here's a sample of what I'm trying to do:

Code: Select all

XIncludeFile "COMatePLUS.pbi"

OpenWindow(0, 10,10,1000,1000, "mime test")
TextGadget(1, 10, 10, 45, 20, "Address:")
StringGadget(2, 60, 10, 250, 20, "")
ButtonGadget(3, 320, 10, 50, 20, "Go ->")
ButtonGadget(4, 390, 10, 50, 20, "Exit")
WebGadget(5, 10, 40, 980, 950, "http://www.honda.com", #PB_Web_BlockPopups)
TextGadget(6, 460, 10, 150, 20, "")
TextGadget(7, 630, 10, 100, 20, "Document Type:")
TextGadget(8, 730, 10, 250, 20, "")

Define MimeType.s, x.i
Global MyBrowser.COMateObject

; Wrap the webgadget
MyBrowser = COMate_WrapCOMObject(GetWindowLong_(GadgetID(5), #GWL_USERDATA))

Repeat

    x = WaitWindowEvent()
    Select x
        
        Case #PB_Event_Gadget
        
            Select EventGadget()
            
                Case 3    ; Pushed the 'Go ->' button
                    
                    SetGadgetText(5, GetGadgetText(2))
                    
                Case 4    ; Exiting the program
                
                    End
                    
                Case 5    ; Ah yes, the webgadget
                
                    Select EventType()
                    
                        Case #PB_EventType_DownloadStart
                            SetGadgetText(6, "Webgadget is busy")
                            SetGadgetText(8, "")
                            
                        Case #PB_EventType_DownloadEnd
                            SetGadgetText(6, "Webgadget is finished")
                            SetGadgetText(2, GetGadgetText(5))

                            ; Now get the document type
                            MimeType = MyBrowser\GetStringProperty("Content-Type")
                            If MimeType <> ""
                                SetGadgetText(8, MimeType)
                            Else
                                MessageRequester("COMate Error", COMate_GetLastErrorDescription())
                            EndIf
                            
                    EndSelect
                    
            EndSelect
        
        Case #PB_Event_CloseWindow
        
            End
            
    EndSelect
    
ForEver

I've tried seemingly dozens of variations on "Content-Type" in the 'GetStringProperty()' function call. (e.g. "ResponseHeaders\Content-Type", "GetResponseHeaders\Content-Type", etc.) Maybe that's not even the correct function to use?

Or is there an easy way to do it without COMatePLUS? Or is there not even an easy way to do it?

Any help would be greatly appreciated.

Thank you.

e.levy
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Retrieve header info from Webgadget

Post by IdeasVacuum »

Well, it is not so simple a task to collect the info you want because every page is different. However, you need to be able to parse the source and PB can grab the html code from the web gadget with GetGadgetItemText. View the source line-by-line via a button in this example, the Pure Basic webpage:

Code: Select all

#Window  = 0
#WebGadg = 1
#Btn     = 3
#Header  = 4

Global igInc = 1

Procedure ShowForm()
;-------------------

  If OpenWindow(#Window, 0, 0, 1024, 800, "PureBasic Website",  #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_TitleBar | #PB_Window_ScreenCentered)

                iWinColour.i = RGB(128,128,128)

              SetWindowColor(#Window, iWinColour)
                   WebGadget(#WebGadg, 0, 0, 1024, 768, "http://www.purebasic.com")
                ButtonGadget(#Btn, 10, 775, 120, 20, "View Header", #PB_Button_Default)
                StringGadget(#Header, 150, 775, 800, 20, "Header")

  EndIf

EndProcedure

ShowForm()

Repeat
         iEvent.i = WaitWindowEvent()
      iGadgetID.i = EventGadget()

         If iEvent = #PB_Event_Gadget

               If iGadgetID = #Btn

                       sPageSource.s = GetGadgetItemText(#WebGadg, #PB_Web_HtmlCode)
                           sHeader.s = StringField(sPageSource, igInc, "<")
                           igInc + 1
                       SetGadgetText(#Header, sHeader)
               EndIf
         EndIf

Until iEvent = #PB_Event_CloseWindow

End
Edit: Forgot to say that your expansion of this would be to search for "Content-Type", something like this but with more thought about potential gotchas:

Code: Select all

#Window  = 0
#WebGadg = 1
#Btn     = 3
#Header  = 4

Global igInc = 1

Procedure ShowForm()
;-------------------

  If OpenWindow(#Window, 0, 0, 1024, 800, "PureBasic Website",  #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_TitleBar | #PB_Window_ScreenCentered)

                iWinColour.i = RGB(128,128,128)

              SetWindowColor(#Window, iWinColour)
                   WebGadget(#WebGadg, 0, 0, 1024, 768, "http://www.purebasic.com")
                ButtonGadget(#Btn, 10, 775, 100, 20, "View Header", #PB_Button_Default)
                StringGadget(#Header, 150, 775, 850, 20, "Header")

  EndIf

EndProcedure


ShowForm()

 sHeader.s = ""
   sChar.s = ""
   iStop.i = #False
iCharVal.i = 0
iCharInc.i = 0

Repeat
         iEvent.i = WaitWindowEvent()
      iGadgetID.i = EventGadget()

         If iEvent = #PB_Event_Gadget

               If iGadgetID = #Btn

                       sPageSource.s = GetGadgetItemText(#WebGadg, #PB_Web_HtmlCode)
                          iCharInc.i = FindString(sPageSource,"Content-Type",1)

                       If(iCharInc = 0) : iCharInc = FindString(sPageSource,"content-type",1) : EndIf

                       If(iCharInc > 0)

                              Repeat
                                          sChar = Mid(sPageSource,iCharInc,1)
                                       iCharVal = Asc(sChar)

                                       iCharInc + 1

                                       If(iCharVal = 62)

                                              iStop = #True
                                       Else
                                              sHeader = sHeader + sChar
                                       EndIf

                              Until iStop = #True

                       Else
                                sHeader = "Content-Type not found"
                       EndIf

                       SetGadgetText(#Header, sHeader)

               EndIf
         EndIf

Until iEvent = #PB_Event_CloseWindow

End
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
JackWebb
Enthusiast
Enthusiast
Posts: 109
Joined: Wed Dec 16, 2009 1:42 pm
Location: Tampa Florida

Re: Retrieve header info from Webgadget

Post by JackWebb »

As an alternative method you may want to consider ReceiveHTTPFile. These things are never cut and dried because there is always the possibility of variation. You will have to do some parsing to get what you want.

Good Luck!
Jack

Code: Select all

InitNetwork()

FileName$ = "c:\ContentType.txt"

If ReceiveHTTPFile("http://www.honda.com", Filename$)
  If ReadFile(0, FileName$)
    While Not Eof(0)
      Temp$ = LCase(ReadString(0, #PB_Ascii))
      If FindString(Temp$, "content-type", 1)
        Debug Temp$ ;build your parsing routine here
      EndIf
    Wend
    CloseFile(0)
  EndIf
EndIf
Make everything as simple as possible, but not simpler. ~Albert Einstein
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Re: Retrieve header info from Webgadget

Post by srod »

You want the following :

Code: Select all

MimeType = MyBrowser\GetStringProperty("Document\mimeType")
However, this will only work if you ensure that the page has finished loading first. Unfortunately, the PB events in this regards do not seem to work properly.

The following works here :

Code: Select all

XIncludeFile "COMatePLUS.pbi"

OpenWindow(0, 10,10,1000,1000, "mime test")
TextGadget(1, 10, 10, 45, 20, "Address:")
StringGadget(2, 60, 10, 250, 20, "")
ButtonGadget(3, 320, 10, 50, 20, "Go ->")
ButtonGadget(4, 390, 10, 50, 20, "Exit")
WebGadget(5, 10, 40, 980, 950, "http://www.honda.com", #PB_Web_BlockPopups)
TextGadget(6, 460, 10, 150, 20, "")
TextGadget(7, 630, 10, 100, 20, "Document Type:")
TextGadget(8, 730, 10, 250, 20, "")

Define MimeType.s, x.i
Global MyBrowser.COMateObject

; Wrap the webgadget
  MyBrowser = COMate_WrapCOMObject(GetWindowLong_(GadgetID(5), #GWL_USERDATA))

While WindowEvent() : Delay(1) : Wend

MimeType = MyBrowser\GetStringProperty("Document\mimeType")
Debug MimeType

Repeat
  x = WaitWindowEvent()
Until x = #PB_Event_CloseWindow
I may look like a mule, but I'm not a complete ass.
elevy
New User
New User
Posts: 4
Joined: Sun Oct 31, 2010 2:12 am

Re: Retrieve header info from Webgadget

Post by elevy »

Thanks to everyone who replied. Unfortunately it's still not doing quite what I want. I think that's my fault for not being explicit enough on exactly what I need to do. Well that and the code I originally submitted is apparently not going to cut the mustard, at least not for this purpose.

So let me try it this way. What I need, specifically, is to be able to detect that the page that is loaded in the webgadget is an embedded PDF file. I originally thought that trying to read the mimetype from the page would do the trick, but it doesn't seem to. So is there some other way to get at the DOM of the page to pick that out? I've never had to programmatically dig into the DOM of a web page before, so I'm not quite sure where to start.

Here's a site that has an embedded PDF. http://www.st.com/stonline/products/lit ... /17127.pdf

Thanks again for your help.

e.levy
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Retrieve header info from Webgadget

Post by IdeasVacuum »

Hi elevy

Well, often you will know that the web page is an embedded PDF simply from the URL? Your URL example reveals that it is a PDF.

That would therefore be the first test. However, not all URLs are going to give the content type, the PDF file could be in an iframe for example,
as this one is http://yings-bling.co.uk/yingsblingpdf.html

However, it seems that all you need to do is search the page source for ".pdf"

Code: Select all

#Window  = 0
#WebGadg = 1
#Btn     = 3
#PDF     = 4

Procedure ShowForm()
;-------------------

  If OpenWindow(#Window, 0, 0, 1024, 800, "Check for embedded PDF",  #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_TitleBar | #PB_Window_ScreenCentered)

                iWinColour.i = RGB(128,128,128)

              SetWindowColor(#Window, iWinColour)
                   WebGadget(#WebGadg, 0, 0, 1024, 768, "http://yings-bling.co.uk/yingsblingpdf.html")
                ButtonGadget(#Btn, 10, 775, 120, 20, "Check For PDF", #PB_Button_Default)
                StringGadget(#PDF, 170, 775, 800, 20, "")

  EndIf

EndProcedure


ShowForm()

sPDF_File.s = ""
    sChar.s = ""
    iStop.i = #False
 iCharVal.i = 0
 iCharInc.i = 0

Repeat
         iEvent.i = WaitWindowEvent()
      iGadgetID.i = EventGadget()

         If iEvent = #PB_Event_Gadget

               If iGadgetID = #Btn

                       sPageSource.s = GetGadgetItemText(#WebGadg, #PB_Web_HtmlCode)
                          iCharInc.i = FindString(sPageSource,".pdf",1)

                       If(iCharInc > 0)

                              Repeat
                                          sChar = Mid(sPageSource,iCharInc,1)
                                       iCharVal = Asc(sChar)

                                       iCharInc - 1

                                       If(iCharVal = 34) ;Speech mark

                                              iStop = #True
                                       Else
                                              sPDF_File = sPDF_File + sChar
                                       EndIf

                              Until iStop = #True

                                sPDF_File = ReverseString(sPDF_File)
                                sPDF_File = sPDF_File + "pdf"

                       Else
                                sPDF_File = "No PDF detected"
                       EndIf

                       SetGadgetText(#PDF, sPDF_File)

               EndIf
         EndIf

Until iEvent = #PB_Event_CloseWindow

End
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
elevy
New User
New User
Posts: 4
Joined: Sun Oct 31, 2010 2:12 am

Re: Retrieve header info from Webgadget

Post by elevy »

@ideasvacuum

The problem is that these pages are being dynamically generated by a server script or program (something with a .do file extension). Any attempt to read pull in the source of the page returns absolutely nothing, and the URL just contains the servlet.do name. Yet, when I inspect the page via the DOM inspector (in Firefox) it shows up as an embedded document. That's why I need to be able to get at the DOM.

As an example, try it here: http://kcerds.dol-esa.gov/query/getOrgQryResult.do and search on file number 514910. Click the union name, then on the next page click on one of the reports under the "Fiscal Year" heading. It will display a PDF file, but determining via the software that it is a PDF is eluding me.

Any ideas?

Thanks again.

e.levy
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Retrieve header info from Webgadget

Post by IdeasVacuum »

Hi elevy

It seems that Java Script, Ajax etc have special functions to access the DOM elements, specifically the DOM Object object:

http://www.w3schools.com/jsref/dom_obj_object.asp
The <object> tag is used to include objects such as images, audio, videos, Java applets, ActiveX, PDF, and Flash into a webpage.
Snag is, if you can't load a dynamically created page into PB's web Gadget, how can you query the DOM with PB? Somewhere near the answer I think would be to embed the required Java Script in the PB app:

http://www.purebasic.fr/english/viewtopic.php?t=36715

A different approach that is potentially closer to the requirement:

http://www.purebasic.fr/english/viewtop ... highlight=
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Retrieve header info from Webgadget

Post by IdeasVacuum »

This snippet from PB Help does get the Header Info from your dynamically created web page example, but the header does not contain any indication that a PDF is embedded, probably because of the use of iframe:

Code: Select all

InitNetwork()

  Header$ = GetHTTPHeader("http://kcerds.dol-esa.gov/query/orgReport.do")

  Repeat

          Index + 1
          Line$ = StringField(Header$, Index, #LF$)

          Debug Line$

  Until Line$ = ""
These guys look interesting:
http://livehttpheaders.mozdev.org/index.html
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Retrieve header info from Webgadget

Post by IdeasVacuum »

....and may be code from Mozilla themselves would help. Take a look at the source for FireFox:

https://developer.mozilla.org/en/Downlo ... ource_Code
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
Post Reply