Page 1 of 1

Web Viewer: how to get image link inside an anchor

Posted: Wed Oct 08, 2014 7:01 pm
by ale870
Hello,

i wish to use Web widget to navigate to a web page, then I wish to get the http address of an image. The problem is this image is inside an anchor. So how can I get image url and not anchor url?

Thank you for your help!

Re: Web Viewer: how to get image link inside an anchor

Posted: Thu Oct 09, 2014 3:23 am
by citystate
if I understand you correctly, you are trying to grab an image from an HTML page - one that is encapsulated by an anchor.

you'll need to grab the complete text of the anchor; that is, everything between the <a> and </a>
as most images in html are referenced as a link in their SRC attribute, like <img src='someimage.img'>, you'll next need to parse the src value and then finally, download the image using something like ReceiveHTTPFile() remembering to include the url with the src name.

of course, this will require you to save the image on disk, if you wanted to download it to memory, you'd need to use the Network procedures rather than the HTTP ones.

Re: Web Viewer: how to get image link inside an anchor

Posted: Thu Oct 09, 2014 7:19 am
by ale870
citystate wrote:if I understand you correctly, you are trying to grab an image from an HTML page - one that is encapsulated by an anchor.

you'll need to grab the complete text of the anchor; that is, everything between the <a> and </a>
as most images in html are referenced as a link in their SRC attribute, like <img src='someimage.img'>, you'll next need to parse the src value and then finally, download the image using something like ReceiveHTTPFile() remembering to include the url with the src name.

of course, this will require you to save the image on disk, if you wanted to download it to memory, you'd need to use the Network procedures rather than the HTTP ones.
Thank you for your suggestion, but the problem is I need to get such info using an HTML widget.
For example: the user goes with the mouse pointer over an image (of the html widget), or click on the image with the right mouse button, and I need to get the path of the image. In fact I wish to let the user save clicked image in a local file (as you can do using any modern web browser).
So, I cannot scan html tags, because I cannot get the right html part related to the clicked image. I need a way to get info from the widget in order to get image path and not anchor path.

Re: Web Viewer: how to get image link inside an anchor

Posted: Thu Oct 09, 2014 2:24 pm
by TI-994A
ale870 wrote:...the user goes with the mouse pointer over an image (of the html widget), or click on the image with the right mouse button, and I need to get the path of the image. In fact I wish to let the user save clicked image in a local file (as you can do using any modern web browser).

...I need a way to get info from the widget in order to get image path and not anchor path.
Not really sure what you mean, but PureBasic's WebGadet() is already able to save images selected with the right mouse button; a context menu would pop-up, and there's a Save picture as… option.

This is clearly not a point-and-click solution, but this example scans the web page for img tags, and copies the associated src attribute into an array, and displays them:

Code: Select all

Enumeration 
  #MainWindow
  #msgBox
  #msg
  #urlSelector
  #browser
  #goKey
  #goBtn  
  #prvBtn
  #nxtBtn
EndEnumeration
Global Dim imgURL.s(0)

Procedure extractImages()
  Shared curImg
  HideWindow(#msgBox, 0)
  ReDim imgURL(0)
  imgURL(0) = ""
  curImg = 0
  SetGadgetText(#browser, GetGadgetText(#urlSelector))
  While GetGadgetAttribute(#browser, #PB_Web_Busy)
    WaitWindowEvent(1)
  Wend
  html$ = GetGadgetItemText(#browser, #PB_Web_HtmlCode)
  html$ = ReplaceString(html$, Chr(39), Chr(34))
  For imgScan = 1 To Len(html$)
    If FindString(html$, "img", imgScan, #PB_String_NoCase)
      imgScan = FindString(html$, "img", imgScan, #PB_String_NoCase) + 3
      For srcScan = imgScan To Len(html$)
        If FindString(html$, "src", srcScan, #PB_String_NoCase)
          srcScan = FindString(html$, "src", srcScan, #PB_String_NoCase) + 3
          imgScan = srcScan
          For quoteScan = srcScan To Len(html$)
            If FindString(html$, Chr(34), quoteScan)
              quoteScan = FindString(html$, Chr(34), quoteScan) + 1
              imgScan = quoteScan
              For endQuote = quoteScan To Len(html$)
                If FindString(html$, Chr(34), endQuote)
                  endQuote = FindString(html$, Chr(34), endQuote)
                  imgScan = endQuote + 1
                  imgURL_Len = endQuote - quoteScan
                  Break
                EndIf  
              Next endQuote
              ReDim imgURL(found)
              imgURL(found) = Mid(html$, quoteScan, imgURL_Len)
              If Not FindString(imgURL(found), "http")
                If Left(imgURL(found), 1) <> "/"
                  imgURL(found) = "/" + imgURL(found)
                EndIf
                imgURL(found) = RTrim(GetGadgetText(#browser), "/") + imgURL(found)
              EndIf
              found + 1
              Break 2  
            EndIf
          Next quoteScan
          Break 1
        EndIf
      Next srcScan
    EndIf
  Next imgScan
  HideWindow(#msgBox, 1)
EndProcedure

wFlags = #PB_Window_SystemMenu | #PB_Window_ScreenCentered
mbFlags = #PB_Window_BorderLess | #PB_Window_ScreenCentered
OpenWindow(#MainWindow, 0, 0, 800, 600, "Extracting Images from WebPage", wFlags)
AddKeyboardShortcut(#MainWindow, #PB_Shortcut_Return, #goKey)
ComboBoxGadget(#urlSelector, 10, 10, 740, 30, #PB_ComboBox_Editable)
WebGadget(#browser, 10, 50, 780, 500, "")
ButtonGadget(#goBtn, 760, 10, 30, 30, "GO")
ButtonGadget(#prvBtn, 10, 560, 360, 30, "Previoue Image")
ButtonGadget(#nxtBtn, 430, 560, 360, 30, "Next Image")

OpenWindow(#msgBox, 0, 0, 200, 100, "", mbFlags, WindowID(#MainWindow))
TextGadget(#msg, 0, 30, 200, 100, "Loading Page & Extracting Images..." + 
                                  #CRLF$ + #CRLF$ + "Please wait.", #PB_Text_Center)
SetWindowColor(#msgBox, RGB(255, 0, 0))
SetGadgetColor(#msg, #PB_Gadget_BackColor, RGB(255, 0, 0))
SetGadgetColor(#msg, #PB_Gadget_FrontColor, RGB(255, 255,255))

AddGadgetItem(#urlSelector, -1, "http://edition.cnn.com")
AddGadgetItem(#urlSelector, -1, "http://www.shutterstock.com/")
AddGadgetItem(#urlSelector, -1, "http://www.giantbomb.com/profile/wakka/lists/the-150-original-pokemon/59579/")

SetGadgetText(#urlSelector, "http://edition.cnn.com")
extractImages()
SetGadgetText(#urlSelector, GetGadgetText(#browser))

Repeat
  Select WaitWindowEvent()
    Case #PB_Event_CloseWindow
      appQuit = 1
    Case #PB_Event_Menu
      Select EventMenu()
        Case #goKey
          If GetGadgetText(#urlSelector)
            extractImages()
          EndIf
      EndSelect
    Case #PB_Event_Gadget
      Select EventGadget()
        Case #goBtn
          If GetGadgetText(#urlSelector)
            extractImages()
          EndIf
        Case #prvBtn, #nxtBtn
          If GetGadgetText(#browser) <> GetGadgetText(#urlSelector)
            SetGadgetText(#urlSelector, GetGadgetText(#browser))
            extractImages()
          EndIf
          SetGadgetText(#browser, imgURL(curImg))
          If EventGadget() = #prvBtn
            curImg - 1
            If curImg < 0
              curImg = ArraySize(imgURL())
            EndIf
          Else
            curImg + 1
            If curImg > ArraySize(imgURL())
              curImg = 0
            EndIf
          EndIf
          While GetGadgetAttribute(#browser, #PB_Web_Busy)
            WaitWindowEvent(1)
          Wend
          SetGadgetText(#urlSelector, GetGadgetText(#browser))
      EndSelect
  EndSelect
Until appQuit = 1
It's a quick and dirty solution, and may not work for all web pages, but hopefully it could be helpful in some way. :)