Page 1 of 1

HTML2TEXT

Posted: Wed Oct 31, 2018 7:22 pm
by incaroad
Hi!
I need a lot of help.

Someone who is a professional would translate these programs to me:
https://www.vbarchiv.net/tipps/tipp_200-html2text.html

VB to PB.

Thank you very much in advance!

Re: HTML2TEXT

Posted: Wed Oct 31, 2018 9:19 pm
by Kwai chang caine

Re: HTML2TEXT

Posted: Thu Nov 01, 2018 10:06 am
by loulou2522

Re: HTML2TEXT

Posted: Fri Nov 09, 2018 7:03 pm
by incaroad
Hello!
Thanks for the comments!
I've created some usable code. How can I create a HTML2TEXT.ddl? And how could it be called? Thank you!

Code: Select all

DataSection
  
  IID_IHTMLDocument2: ; {332C4425-26CB-11D0-B483-00C04FD90119}
  Data.l $332C4425
  Data.w $26CB, $11D0       
  Data.b $B4, $83, $00, $C0, $4F, $D9, $01, $19
  
  IID_IHTMLDocument3: ; {3050F485-98B5-11CF-BB82-00AA00BDCE0B}
  Data.l $3050F485
  Data.w $98B5, $11CF
  Data.b $BB, $82, $00, $AA, $00, $BD, $CE, $0B
  
  IID_NULL: ; {00000000-0000-0000-0000-000000000000}
  Data.l $00000000
  Data.w $0000, $0000
  Data.b $00, $00, $00, $00, $00, $00, $00, $00
  
EndDataSection



;----------------
Procedure WebGadget_Document(Gadget, *IID)
  Document = 0
  
  Browser.IWebBrowser2 = GetWindowLong_(GadgetID(Gadget), #GWL_USERDATA)
  If Browser
    If Browser\get_Document(@DocumentDispatch.IDispatch) = #S_OK And DocumentDispatch
      DocumentDispatch\QueryInterface(*IID, @Document)
      DocumentDispatch\Release()
    EndIf
  EndIf     
  
  ProcedureReturn Document
EndProcedure
;------------------------------

Procedure.s WebGadget_PageText(Gadget)
  Result$ = ""
  
  Document.IHTMLDocument2 = WebGadget_Document(Gadget, ?IID_IHTMLDocument2)
  If Document
    If Document\get_body(@Body.IHTMLElement) = #S_OK   
      If Body\get_innerText(@bstr_text) = #S_OK And bstr_text
        Result$ = PeekS(bstr_text, -1, #PB_Unicode)
        SysFreeString_(bstr_text)
      EndIf         
      
      Body\Release() 
    EndIf       
    Document\Release()
  EndIf         
  
  ProcedureReturn Result$
EndProcedure

Procedure.s HTML2TEXT (url.s, out.s="txt")
  
  
  html.s
  
  DeleteFile("oldal.html")
  
  InitNetwork()
  
  URL$=url
  ReceiveHTTPFile(URL$,"oldal.html")
  ReadFile(0, "oldal.html")   ; if the file could be read, we continue...
  While Eof(0) = 0            ; loop as long the 'end of file' isn't reached
    html= html+ ReadString(0) ; display line by line in the debug window
    
  Wend
  CloseFile(0)               ; close the previously opened file
  
  DeleteFile("oldal.html")
  
  If out="csv"
    hRegex = CreateRegularExpression(#PB_Any, "<\/? ?td ?\/?>", #PB_RegularExpression_NoCase)
    html = ReplaceRegularExpression(hRegex, html, ";")
  EndIf
  
  
  OpenFile(1,"oldal.html")
  WriteString(1,html)
  CloseFile(1)
  
  
  
  
  
  
  
  
  If OpenWindow(0, 0, 0, 600, 300, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered | #PB_Window_Invisible)
    
    WebGadget(0, 10, 10, 580, 280, "file://"+GetCurrentDirectory() + "oldal.html")
    myBrowser.IWebBrowser2 = GetWindowLong_(GadgetID(0), #GWL_USERDATA)
    myBrowser\put_Silent(#True)  
    ; Note: if you want to use a local file, change last parameter to "file://" + path + filename
    Repeat
      Event = WaitWindowEvent() 
      
      
      
      If  GetGadgetAttribute(0,#PB_Web_Busy)=0
        
       szoveg.s=WebGadget_PageText(0)
        Break
      EndIf
      
    Until Event = #PB_Event_CloseWindow
  EndIf
  
  ProcedureReturn szoveg
EndProcedure

Debug HTML2TEXT ("http://bestbet.site/show.php?show=one", "csv")

Re: HTML2TEXT

Posted: Fri Nov 09, 2018 8:58 pm
by Kwai chang caine
1/ For create a DLL instead of an EXE, in the compiler options, in "executable format" choose "Shared DLL"

2/ For calling the DLL, you have the library
https://www.purebasic.com/documentation ... index.html

And using Callfunction or CallfunctionFast
https://www.purebasic.com/documentation ... ry.pb.html
Or prototypes
https://www.purebasic.com/documentation ... types.html

Re: HTML2TEXT

Posted: Fri Nov 09, 2018 9:10 pm
by incaroad
I've created a dll file but it does not work.
Somewhere I'll wrong it. I'm a very beginner.

How do I rewrite this program to work?

Re: HTML2TEXT

Posted: Sat Nov 10, 2018 11:47 am
by incaroad

Code: Select all

....
ProcedureDLL.s HTML2TEXT (url.s, out.s="txt")
  
  
  html.s
  
  DeleteFile("oldal.html")
  
  InitNetwork()
  
  URL$=url
  ReceiveHTTPFile(URL$,"oldal.html")
  ReadFile(0, "oldal.html")   ; if the file could be read, we continue...
  While Eof(0) = 0            ; loop as long the 'end of file' isn't reached
    html= html+ ReadString(0) ; display line by line in the debug window
    
  Wend
  CloseFile(0)               ; close the previously opened file
  
  DeleteFile("oldal.html")
  
  If out="csv"
    hRegex = CreateRegularExpression(#PB_Any, "<\/? ?td ?\/?>", #PB_RegularExpression_NoCase)
    html = ReplaceRegularExpression(hRegex, html, ";")
  EndIf
  
  
  OpenFile(1,"oldal.html")
  WriteString(1,html)
  CloseFile(1)
  
  
  
  If OpenWindow(0, 0, 0, 600, 300, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered | #PB_Window_Invisible)
    
    WebGadget(0, 10, 10, 580, 280, "file://"+GetCurrentDirectory() + "oldal.html")
    myBrowser.IWebBrowser2 = GetWindowLong_(GadgetID(0), #GWL_USERDATA)
    myBrowser\put_Silent(#True)  
    ; Note: if you want to use a local file, change last parameter to "file://" + path + filename
    Repeat
      Event = WaitWindowEvent() 
      
      
      
      If  GetGadgetAttribute(0,#PB_Web_Busy)=0
        
        Global  szoveg.s=WebGadget_PageText(0)
        Break
      EndIf
      
      DeleteFile("oldal.html")
      
    Until Event = #PB_Event_CloseWindow
  EndIf
  
  ProcedureReturn szoveg
EndProcedure
...
The previous code rewritten so I make dll file.

Code: Select all

 OpenLibrary(0,"HTML2TEXT.dll")
   Prototype.s ProtoFunction(url.s, out.s="txt")
   HTML2TEXT.ProtoFunction=GetFunction(0, "HTML2TEXT")
   
   td.s = PeekS(HTML2TEXT("http://bestbet.site/show.php?show=one","csv"))

  CloseLibrary(0)
   
   Debug td.s
I'm trying to call it all but never succeed. Where do I break it?

Re: HTML2TEXT

Posted: Sat Nov 10, 2018 1:32 pm
by infratec
Without the full code it is very difficult to help.

One question...

Who closes your window inside the dll ?
Is there a PostEvent ?

Is the variable szoveg Global ?
(Read the help about DLL)

You don't need PeekS() if you use GetFunction()

...

Re: HTML2TEXT

Posted: Sat Nov 10, 2018 2:38 pm
by incaroad
Thanks for Infratec!

Full code in the 4. comment.


I'm trying to fix it.

Re: HTML2TEXT

Posted: Sun Nov 11, 2018 3:02 pm
by drgolf
hello,

works on windows 10 x64 with pb 5.70 LTS B2 :

for the DLL (html2text.dll):

Code: Select all

EnableExplicit
InitNetwork()
;
DataSection
  IID_IHTMLDocument2: ; {332C4425-26CB-11D0-B483-00C04FD90119}
  Data.l $332C4425
  Data.w $26CB, $11D0       
  Data.b $B4, $83, $00, $C0, $4F, $D9, $01, $19
  ;
  IID_IHTMLDocument3: ; {3050F485-98B5-11CF-BB82-00AA00BDCE0B}
  Data.l $3050F485
  Data.w $98B5, $11CF
  Data.b $BB, $82, $00, $AA, $00, $BD, $CE, $0B
  ;
  IID_NULL: ; {00000000-0000-0000-0000-000000000000}
  Data.l $00000000
  Data.w $0000, $0000
  Data.b $00, $00, $00, $00, $00, $00, $00, $00 
EndDataSection
;
Global szoveg.s
;----------------
ProcedureDLL WebGadget_Document(Gadget, *IID)
  Protected Document, Browser.iwebbrowser2,DocumentDispatch.idispatch 
  Browser = GetWindowLong_(GadgetID(Gadget), #GWL_USERDATA)
  If Browser
    If Browser\get_Document(@DocumentDispatch) = #S_OK And DocumentDispatch
      DocumentDispatch\QueryInterface(*IID, @Document)
      DocumentDispatch\Release()
    EndIf
  EndIf     
   ProcedureReturn Document
EndProcedure
;------------------------------

ProcedureDLL.s WebGadget_PageText(Gadget)
  Protected Document.ihtmldocument2, bstr_text, result$, body.ihtmlelement
  ;Result$ = ""
 
  Document.IHTMLDocument2 = WebGadget_Document(Gadget, ?IID_IHTMLDocument2)
  If Document
    If Document\get_body(@Body) = #S_OK   
      If Body\get_innerText(@bstr_text) = #S_OK And bstr_text
        Result$ = PeekS(bstr_text, -1, #PB_Unicode)
        SysFreeString_(bstr_text)
      EndIf         
     
      Body\Release()
    EndIf       
    Document\Release()
  EndIf         
 
  ProcedureReturn Result$
EndProcedure

ProcedureDLL.s HTML2TEXT (url.s, out.s="txt")
 Protected html.s, myBrowser.iwebbrowser2,event
 ;html.s
 szoveg=""
 ;
 If FileSize("oldal.html")>0
   DeleteFile("oldal.html")
 EndIf
 
 ;InitNetwork()
 
  ;URL$=url
  If ReceiveHTTPFile(url,"oldal.html")
  If out="csv"  
  ReadFile(0, "oldal.html")   ; if the file could be read, we continue...
  While Eof(0) = 0            ; loop as long the 'end of file' isn't reached
    html+ ReadString(0) ; display line by line in the debug window
  Wend
  CloseFile(0)               ; close the previously opened file
 
  DeleteFile("oldal.html")
    
    If CreateRegularExpression(0, "<\/? ?td ?\/?>", #PB_RegularExpression_NoCase)
      html = ReplaceRegularExpression(0, html, ";")
    EndIf
    
  OpenFile(1,"oldal.html")
  WriteString(1,html)
  CloseFile(1)
 EndIf
 ; 
  If OpenWindow(0, 0, 0, 600, 300, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered | #PB_Window_Invisible)
   
    WebGadget(0, 10, 10, 580, 280, "file://"+GetCurrentDirectory() + "oldal.html")
    myBrowser = GetWindowLong_(GadgetID(0), #GWL_USERDATA)
    myBrowser\put_Silent(#True) 
    ; Note: if you want to use a local file, change last parameter to "file://" + path + filename
    Repeat
      Event = WaitWindowEvent()
          
      If  GetGadgetAttribute(0,#PB_Web_Busy)=0
        szoveg=WebGadget_PageText(0)
        DeleteFile("oldal.html")
        CloseWindow(0)
        Break
      EndIf
     
    Until Event = #PB_Event_CloseWindow
  EndIf
 EndIf
  ProcedureReturn szoveg
EndProcedure

;Debug HTML2TEXT ("http://bestbet.site/show.php?show=one", "csv")

for testing the DLL :

Code: Select all

Prototype.i ProtoFunction(url.s, out.s="txt")
If OpenLibrary(0,"HTML2TEXT.dll")
   
   HTML2TEXT.ProtoFunction=GetFunction(0, "HTML2TEXT")
   
   td.s = PeekS(HTML2TEXT("http://bestbet.site/show.php?show=one","csv"))
   
  CloseLibrary(0)
EndIf

Debug td.s


Re: HTML2TEXT

Posted: Sun Nov 11, 2018 4:40 pm
by incaroad
Hello!

Thank you very much Drgolf!
You are very professional.
:)