Page 1 of 1

Retrieve header info from Webgadget

Posted: Sun Oct 31, 2010 2:29 am
by elevy
This is probably easier than I'm making it out to be, but it's been killing me for a while. All I'm trying to do is get the header information (specifically the 'Content-Type') from a web page that's loaded into a webgadget. I've been battling it all day long and it's driving me nuts!!!

I'm trying to do it using COMatePLUS, as I figured that would make it easy to access, but I don't know what the "command string" is supposed to be to get to it.

Can anybody help me? Here's a sample of what I'm trying to do:

Code: Select all

XIncludeFile "COMatePLUS.pbi"

OpenWindow(0, 10,10,1000,1000, "mime test")
TextGadget(1, 10, 10, 45, 20, "Address:")
StringGadget(2, 60, 10, 250, 20, "")
ButtonGadget(3, 320, 10, 50, 20, "Go ->")
ButtonGadget(4, 390, 10, 50, 20, "Exit")
WebGadget(5, 10, 40, 980, 950, "http://www.honda.com", #PB_Web_BlockPopups)
TextGadget(6, 460, 10, 150, 20, "")
TextGadget(7, 630, 10, 100, 20, "Document Type:")
TextGadget(8, 730, 10, 250, 20, "")

Define MimeType.s, x.i
Global MyBrowser.COMateObject

; Wrap the webgadget
MyBrowser = COMate_WrapCOMObject(GetWindowLong_(GadgetID(5), #GWL_USERDATA))

Repeat

    x = WaitWindowEvent()
    Select x
        
        Case #PB_Event_Gadget
        
            Select EventGadget()
            
                Case 3    ; Pushed the 'Go ->' button
                    
                    SetGadgetText(5, GetGadgetText(2))
                    
                Case 4    ; Exiting the program
                
                    End
                    
                Case 5    ; Ah yes, the webgadget
                
                    Select EventType()
                    
                        Case #PB_EventType_DownloadStart
                            SetGadgetText(6, "Webgadget is busy")
                            SetGadgetText(8, "")
                            
                        Case #PB_EventType_DownloadEnd
                            SetGadgetText(6, "Webgadget is finished")
                            SetGadgetText(2, GetGadgetText(5))

                            ; Now get the document type
                            MimeType = MyBrowser\GetStringProperty("Content-Type")
                            If MimeType <> ""
                                SetGadgetText(8, MimeType)
                            Else
                                MessageRequester("COMate Error", COMate_GetLastErrorDescription())
                            EndIf
                            
                    EndSelect
                    
            EndSelect
        
        Case #PB_Event_CloseWindow
        
            End
            
    EndSelect
    
ForEver

I've tried seemingly dozens of variations on "Content-Type" in the 'GetStringProperty()' function call. (e.g. "ResponseHeaders\Content-Type", "GetResponseHeaders\Content-Type", etc.) Maybe that's not even the correct function to use?

Or is there an easy way to do it without COMatePLUS? Or is there not even an easy way to do it?

Any help would be greatly appreciated.

Thank you.

e.levy

Re: Retrieve header info from Webgadget

Posted: Sun Oct 31, 2010 4:32 am
by IdeasVacuum
Well, it is not so simple a task to collect the info you want because every page is different. However, you need to be able to parse the source and PB can grab the html code from the web gadget with GetGadgetItemText. View the source line-by-line via a button in this example, the Pure Basic webpage:

Code: Select all

#Window  = 0
#WebGadg = 1
#Btn     = 3
#Header  = 4

Global igInc = 1

Procedure ShowForm()
;-------------------

  If OpenWindow(#Window, 0, 0, 1024, 800, "PureBasic Website",  #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_TitleBar | #PB_Window_ScreenCentered)

                iWinColour.i = RGB(128,128,128)

              SetWindowColor(#Window, iWinColour)
                   WebGadget(#WebGadg, 0, 0, 1024, 768, "http://www.purebasic.com")
                ButtonGadget(#Btn, 10, 775, 120, 20, "View Header", #PB_Button_Default)
                StringGadget(#Header, 150, 775, 800, 20, "Header")

  EndIf

EndProcedure

ShowForm()

Repeat
         iEvent.i = WaitWindowEvent()
      iGadgetID.i = EventGadget()

         If iEvent = #PB_Event_Gadget

               If iGadgetID = #Btn

                       sPageSource.s = GetGadgetItemText(#WebGadg, #PB_Web_HtmlCode)
                           sHeader.s = StringField(sPageSource, igInc, "<")
                           igInc + 1
                       SetGadgetText(#Header, sHeader)
               EndIf
         EndIf

Until iEvent = #PB_Event_CloseWindow

End
Edit: Forgot to say that your expansion of this would be to search for "Content-Type", something like this but with more thought about potential gotchas:

Code: Select all

#Window  = 0
#WebGadg = 1
#Btn     = 3
#Header  = 4

Global igInc = 1

Procedure ShowForm()
;-------------------

  If OpenWindow(#Window, 0, 0, 1024, 800, "PureBasic Website",  #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_TitleBar | #PB_Window_ScreenCentered)

                iWinColour.i = RGB(128,128,128)

              SetWindowColor(#Window, iWinColour)
                   WebGadget(#WebGadg, 0, 0, 1024, 768, "http://www.purebasic.com")
                ButtonGadget(#Btn, 10, 775, 100, 20, "View Header", #PB_Button_Default)
                StringGadget(#Header, 150, 775, 850, 20, "Header")

  EndIf

EndProcedure


ShowForm()

 sHeader.s = ""
   sChar.s = ""
   iStop.i = #False
iCharVal.i = 0
iCharInc.i = 0

Repeat
         iEvent.i = WaitWindowEvent()
      iGadgetID.i = EventGadget()

         If iEvent = #PB_Event_Gadget

               If iGadgetID = #Btn

                       sPageSource.s = GetGadgetItemText(#WebGadg, #PB_Web_HtmlCode)
                          iCharInc.i = FindString(sPageSource,"Content-Type",1)

                       If(iCharInc = 0) : iCharInc = FindString(sPageSource,"content-type",1) : EndIf

                       If(iCharInc > 0)

                              Repeat
                                          sChar = Mid(sPageSource,iCharInc,1)
                                       iCharVal = Asc(sChar)

                                       iCharInc + 1

                                       If(iCharVal = 62)

                                              iStop = #True
                                       Else
                                              sHeader = sHeader + sChar
                                       EndIf

                              Until iStop = #True

                       Else
                                sHeader = "Content-Type not found"
                       EndIf

                       SetGadgetText(#Header, sHeader)

               EndIf
         EndIf

Until iEvent = #PB_Event_CloseWindow

End

Re: Retrieve header info from Webgadget

Posted: Sun Oct 31, 2010 6:21 am
by JackWebb
As an alternative method you may want to consider ReceiveHTTPFile. These things are never cut and dried because there is always the possibility of variation. You will have to do some parsing to get what you want.

Good Luck!
Jack

Code: Select all

InitNetwork()

FileName$ = "c:\ContentType.txt"

If ReceiveHTTPFile("http://www.honda.com", Filename$)
  If ReadFile(0, FileName$)
    While Not Eof(0)
      Temp$ = LCase(ReadString(0, #PB_Ascii))
      If FindString(Temp$, "content-type", 1)
        Debug Temp$ ;build your parsing routine here
      EndIf
    Wend
    CloseFile(0)
  EndIf
EndIf

Re: Retrieve header info from Webgadget

Posted: Sun Oct 31, 2010 10:27 am
by srod
You want the following :

Code: Select all

MimeType = MyBrowser\GetStringProperty("Document\mimeType")
However, this will only work if you ensure that the page has finished loading first. Unfortunately, the PB events in this regards do not seem to work properly.

The following works here :

Code: Select all

XIncludeFile "COMatePLUS.pbi"

OpenWindow(0, 10,10,1000,1000, "mime test")
TextGadget(1, 10, 10, 45, 20, "Address:")
StringGadget(2, 60, 10, 250, 20, "")
ButtonGadget(3, 320, 10, 50, 20, "Go ->")
ButtonGadget(4, 390, 10, 50, 20, "Exit")
WebGadget(5, 10, 40, 980, 950, "http://www.honda.com", #PB_Web_BlockPopups)
TextGadget(6, 460, 10, 150, 20, "")
TextGadget(7, 630, 10, 100, 20, "Document Type:")
TextGadget(8, 730, 10, 250, 20, "")

Define MimeType.s, x.i
Global MyBrowser.COMateObject

; Wrap the webgadget
  MyBrowser = COMate_WrapCOMObject(GetWindowLong_(GadgetID(5), #GWL_USERDATA))

While WindowEvent() : Delay(1) : Wend

MimeType = MyBrowser\GetStringProperty("Document\mimeType")
Debug MimeType

Repeat
  x = WaitWindowEvent()
Until x = #PB_Event_CloseWindow

Re: Retrieve header info from Webgadget

Posted: Tue Nov 02, 2010 12:42 am
by elevy
Thanks to everyone who replied. Unfortunately it's still not doing quite what I want. I think that's my fault for not being explicit enough on exactly what I need to do. Well that and the code I originally submitted is apparently not going to cut the mustard, at least not for this purpose.

So let me try it this way. What I need, specifically, is to be able to detect that the page that is loaded in the webgadget is an embedded PDF file. I originally thought that trying to read the mimetype from the page would do the trick, but it doesn't seem to. So is there some other way to get at the DOM of the page to pick that out? I've never had to programmatically dig into the DOM of a web page before, so I'm not quite sure where to start.

Here's a site that has an embedded PDF. http://www.st.com/stonline/products/lit ... /17127.pdf

Thanks again for your help.

e.levy

Re: Retrieve header info from Webgadget

Posted: Tue Nov 02, 2010 4:00 am
by IdeasVacuum
Hi elevy

Well, often you will know that the web page is an embedded PDF simply from the URL? Your URL example reveals that it is a PDF.

That would therefore be the first test. However, not all URLs are going to give the content type, the PDF file could be in an iframe for example,
as this one is http://yings-bling.co.uk/yingsblingpdf.html

However, it seems that all you need to do is search the page source for ".pdf"

Code: Select all

#Window  = 0
#WebGadg = 1
#Btn     = 3
#PDF     = 4

Procedure ShowForm()
;-------------------

  If OpenWindow(#Window, 0, 0, 1024, 800, "Check for embedded PDF",  #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_TitleBar | #PB_Window_ScreenCentered)

                iWinColour.i = RGB(128,128,128)

              SetWindowColor(#Window, iWinColour)
                   WebGadget(#WebGadg, 0, 0, 1024, 768, "http://yings-bling.co.uk/yingsblingpdf.html")
                ButtonGadget(#Btn, 10, 775, 120, 20, "Check For PDF", #PB_Button_Default)
                StringGadget(#PDF, 170, 775, 800, 20, "")

  EndIf

EndProcedure


ShowForm()

sPDF_File.s = ""
    sChar.s = ""
    iStop.i = #False
 iCharVal.i = 0
 iCharInc.i = 0

Repeat
         iEvent.i = WaitWindowEvent()
      iGadgetID.i = EventGadget()

         If iEvent = #PB_Event_Gadget

               If iGadgetID = #Btn

                       sPageSource.s = GetGadgetItemText(#WebGadg, #PB_Web_HtmlCode)
                          iCharInc.i = FindString(sPageSource,".pdf",1)

                       If(iCharInc > 0)

                              Repeat
                                          sChar = Mid(sPageSource,iCharInc,1)
                                       iCharVal = Asc(sChar)

                                       iCharInc - 1

                                       If(iCharVal = 34) ;Speech mark

                                              iStop = #True
                                       Else
                                              sPDF_File = sPDF_File + sChar
                                       EndIf

                              Until iStop = #True

                                sPDF_File = ReverseString(sPDF_File)
                                sPDF_File = sPDF_File + "pdf"

                       Else
                                sPDF_File = "No PDF detected"
                       EndIf

                       SetGadgetText(#PDF, sPDF_File)

               EndIf
         EndIf

Until iEvent = #PB_Event_CloseWindow

End

Re: Retrieve header info from Webgadget

Posted: Wed Nov 03, 2010 12:11 am
by elevy
@ideasvacuum

The problem is that these pages are being dynamically generated by a server script or program (something with a .do file extension). Any attempt to read pull in the source of the page returns absolutely nothing, and the URL just contains the servlet.do name. Yet, when I inspect the page via the DOM inspector (in Firefox) it shows up as an embedded document. That's why I need to be able to get at the DOM.

As an example, try it here: http://kcerds.dol-esa.gov/query/getOrgQryResult.do and search on file number 514910. Click the union name, then on the next page click on one of the reports under the "Fiscal Year" heading. It will display a PDF file, but determining via the software that it is a PDF is eluding me.

Any ideas?

Thanks again.

e.levy

Re: Retrieve header info from Webgadget

Posted: Wed Nov 03, 2010 2:27 am
by IdeasVacuum
Hi elevy

It seems that Java Script, Ajax etc have special functions to access the DOM elements, specifically the DOM Object object:

http://www.w3schools.com/jsref/dom_obj_object.asp
The <object> tag is used to include objects such as images, audio, videos, Java applets, ActiveX, PDF, and Flash into a webpage.
Snag is, if you can't load a dynamically created page into PB's web Gadget, how can you query the DOM with PB? Somewhere near the answer I think would be to embed the required Java Script in the PB app:

http://www.purebasic.fr/english/viewtopic.php?t=36715

A different approach that is potentially closer to the requirement:

http://www.purebasic.fr/english/viewtop ... highlight=

Re: Retrieve header info from Webgadget

Posted: Wed Nov 03, 2010 2:50 am
by IdeasVacuum
This snippet from PB Help does get the Header Info from your dynamically created web page example, but the header does not contain any indication that a PDF is embedded, probably because of the use of iframe:

Code: Select all

InitNetwork()

  Header$ = GetHTTPHeader("http://kcerds.dol-esa.gov/query/orgReport.do")

  Repeat

          Index + 1
          Line$ = StringField(Header$, Index, #LF$)

          Debug Line$

  Until Line$ = ""
These guys look interesting:
http://livehttpheaders.mozdev.org/index.html

Re: Retrieve header info from Webgadget

Posted: Wed Nov 03, 2010 8:01 pm
by IdeasVacuum
....and may be code from Mozilla themselves would help. Take a look at the source for FireFox:

https://developer.mozilla.org/en/Downlo ... ource_Code