Page 1 of 1

read links in a webgadget?

Posted: Wed Jan 23, 2008 1:34 am
by nicolaus
Hello,

How i can read all links from a side what is shown / load in a webgadget?
I want loat all the links from a side into a list but i cant finde a way to read the links.

Thanks
Nico

Posted: Wed Jan 23, 2008 4:07 am
by netmaestro
You can get all the html code in the gadget with something like:

Code: Select all

html$ = GetGadgetItemText(#webgadget, #PB_Web_HtmlCode)
and parse it for links.

Posted: Wed Jan 23, 2008 9:46 am
by Kiffi
Hello nicolaus,

for this code you need to include the PureDispHelper

Code: Select all

; GetLinksFromWebpage

EnableExplicit

Define.l oIE, oDoc, oLink, Result
Define.l CountLinks.l, Counter.l
Define.l InnerText.l, HRef.l

dhToggleExceptions(#True)
oIE = dhCreateObject("InternetExplorer.Application")

If oIE
 
  dhPutValue  (oIE, ".Visible = %b", #False)
  dhCallMethod(oIE, ".Navigate(%T)", @"www.google.de")
 
  Repeat
    dhGetValue("%d", @Result, oIE, ".ReadyState")
  Until Result = 4
 
  dhGetValue("%o", @oDoc, oIE, ".Document")
 
  If oDoc
   
    dhGetValue("%d", @CountLinks, oDoc, ".Links.Length")
   
    Debug Str(CountLinks) + " Link(s) available"
   
    Debug "------------"
   
    If CountLinks
     
      For Counter = 0 To CountLinks - 1
       
        dhGetValue("%o", @oLink, oDoc, ".Links(%d)", Counter)
       
        If oLink
         
          dhGetValue("%T", @InnerText, oLink, ".InnerText")
          If InnerText
            Debug "Link " + Str(Counter + 1) + " (Text): " + PeekS(InnerText)
            dhFreeString(InnerText)
          EndIf
           
          dhGetValue("%T", @HRef, oLink, ".HRef")
          If HRef
            Debug "Link " + Str(Counter + 1) + " (Address): " + PeekS(HRef)
            dhFreeString(HRef)
          EndIf
         
          Debug "------------"
         
          dhReleaseObject(oLink)
         
        EndIf
       
      Next
     
    EndIf
   
    dhReleaseObject(oDoc)
   
  EndIf
 
  dhReleaseObject(oIE)
 
EndIf
Greetings ... Kiffi

Posted: Wed Jan 23, 2008 11:15 am
by nicolaus
@Kiffi

THX that is exactly what i want!
Where i can fined more about disphelper? In your code you have this line

Code: Select all

 dhCallMethod(oIE, ".Navigate(%T)", @"www.google.de") 
and how i know what is the right method (the second param)?

Greetings
Nico

Posted: Sun Feb 22, 2009 1:31 pm
by PB
Thanks Kiffi, works great! :D

Posted: Mon Feb 23, 2009 11:44 am
by Kiffi
PB wrote:Thanks Kiffi, works great! :D
you're welcome! :-)

here is the code using COMate:

Code: Select all

; GetLinksFromWebpage

EnableExplicit

IncludePath #PB_Compiler_Home + "\srod\comate"
XIncludeFile "comate.pbi"

Define oIE.COMateObject, oDoc.COMateObject, oLink.COMateObject
Define CountLinks, Counter
Define InnerText.s, HRef.s


oIE = COMate_CreateObject("InternetExplorer.Application")

If oIE
  
  oIE\SetProperty("Visible = #False")
  oIE\Invoke("Navigate('http://www.purebasic.com')") 
  
  Repeat
  Until oIE\GetIntegerProperty("ReadyState") = 4 
  
  oDoc = oIE\getObjectProperty("Document")
  
  If oDoc
    
    CountLinks = oDoc\GetIntegerProperty("Links\Length")
    
    Debug Str(CountLinks) + " Link(s) available"
    
    Debug "------------"
    
    If CountLinks
      
      For Counter = 0 To CountLinks - 1
        
        oLink = oDoc\getObjectProperty("Links(" + Str(Counter) + ")")
        
        If oLink
          
          InnerText = oLink\GetStringProperty("InnerText")
          Debug "Link " + Str(Counter + 1) + " (Text): " + InnerText
          
          HRef = oLink\GetStringProperty("HRef")
          Debug "Link " + Str(Counter + 1) + " (Address): " + HRef
          
          Debug "------------"
          
          oLink\Release()
          
        EndIf
        
      Next
      
    EndIf
    
    oDoc\Release()
    
  EndIf
  
  oIE\Release()
  
EndIf

Re:

Posted: Fri Nov 05, 2010 12:31 pm
by PB
I have a problem with Kiffi's code now. :(

When I'm logged into Facebook and I parse this URL for links...

http://www.facebook.com/home.php?sk=lf

...then this block of code always returns 1 for Result (never 4):

Code: Select all

Repeat
  dhGetValue("%d", @Result, oIE, ".ReadyState")
Until Result = 4
Which, as you can tell, means an endless loop.

I did some testing and the return value from dhGetValue is 0.
If I check for that and break out of the loop, then I get this
error box:

Code: Select all

---------------------------
Error
---------------------------
Member:	  .Document
Function:	  GetValue		
Error In:	  InvokeArray
Error:	  Unspecified error
Code:	  80004005
Source:	  IDispatch Interface
---------------------------
OK
---------------------------
What to do?

Re: Re:

Posted: Thu May 19, 2011 1:20 pm
by MachineCode
Does anyone know if the example above can be used to get all the image links, rather than hyperlinks, in a page?

Re: Re:

Posted: Fri Jun 03, 2011 10:05 am
by Kiffi
MachineCode wrote:Does anyone know if the example above can be used to get all the image links, rather than hyperlinks, in a page?
see here: http://www.purebasic.fr/english/viewtop ... 12&t=36036

Greetings ... Kiffi

Re: Re:

Posted: Fri Jun 03, 2011 12:01 pm
by MachineCode
Thanks Kiffi, checking it out now.