Page 1 of 1
HTML2TEXT
Posted: Wed Oct 31, 2018 7:22 pm
by incaroad
Hi!
I need a lot of help.
Someone who is a professional would translate these programs to me:
https://www.vbarchiv.net/tipps/tipp_200-html2text.html
VB to PB.
Thank you very much in advance!
Re: HTML2TEXT
Posted: Wed Oct 31, 2018 9:19 pm
by Kwai chang caine
Re: HTML2TEXT
Posted: Thu Nov 01, 2018 10:06 am
by loulou2522
Re: HTML2TEXT
Posted: Fri Nov 09, 2018 7:03 pm
by incaroad
Hello!
Thanks for the comments!
I've created some usable code. How can I create a HTML2TEXT.ddl? And how could it be called? Thank you!
Code: Select all
DataSection
IID_IHTMLDocument2: ; {332C4425-26CB-11D0-B483-00C04FD90119}
Data.l $332C4425
Data.w $26CB, $11D0
Data.b $B4, $83, $00, $C0, $4F, $D9, $01, $19
IID_IHTMLDocument3: ; {3050F485-98B5-11CF-BB82-00AA00BDCE0B}
Data.l $3050F485
Data.w $98B5, $11CF
Data.b $BB, $82, $00, $AA, $00, $BD, $CE, $0B
IID_NULL: ; {00000000-0000-0000-0000-000000000000}
Data.l $00000000
Data.w $0000, $0000
Data.b $00, $00, $00, $00, $00, $00, $00, $00
EndDataSection
;----------------
Procedure WebGadget_Document(Gadget, *IID)
Document = 0
Browser.IWebBrowser2 = GetWindowLong_(GadgetID(Gadget), #GWL_USERDATA)
If Browser
If Browser\get_Document(@DocumentDispatch.IDispatch) = #S_OK And DocumentDispatch
DocumentDispatch\QueryInterface(*IID, @Document)
DocumentDispatch\Release()
EndIf
EndIf
ProcedureReturn Document
EndProcedure
;------------------------------
Procedure.s WebGadget_PageText(Gadget)
Result$ = ""
Document.IHTMLDocument2 = WebGadget_Document(Gadget, ?IID_IHTMLDocument2)
If Document
If Document\get_body(@Body.IHTMLElement) = #S_OK
If Body\get_innerText(@bstr_text) = #S_OK And bstr_text
Result$ = PeekS(bstr_text, -1, #PB_Unicode)
SysFreeString_(bstr_text)
EndIf
Body\Release()
EndIf
Document\Release()
EndIf
ProcedureReturn Result$
EndProcedure
Procedure.s HTML2TEXT (url.s, out.s="txt")
html.s
DeleteFile("oldal.html")
InitNetwork()
URL$=url
ReceiveHTTPFile(URL$,"oldal.html")
ReadFile(0, "oldal.html") ; if the file could be read, we continue...
While Eof(0) = 0 ; loop as long the 'end of file' isn't reached
html= html+ ReadString(0) ; display line by line in the debug window
Wend
CloseFile(0) ; close the previously opened file
DeleteFile("oldal.html")
If out="csv"
hRegex = CreateRegularExpression(#PB_Any, "<\/? ?td ?\/?>", #PB_RegularExpression_NoCase)
html = ReplaceRegularExpression(hRegex, html, ";")
EndIf
OpenFile(1,"oldal.html")
WriteString(1,html)
CloseFile(1)
If OpenWindow(0, 0, 0, 600, 300, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered | #PB_Window_Invisible)
WebGadget(0, 10, 10, 580, 280, "file://"+GetCurrentDirectory() + "oldal.html")
myBrowser.IWebBrowser2 = GetWindowLong_(GadgetID(0), #GWL_USERDATA)
myBrowser\put_Silent(#True)
; Note: if you want to use a local file, change last parameter to "file://" + path + filename
Repeat
Event = WaitWindowEvent()
If GetGadgetAttribute(0,#PB_Web_Busy)=0
szoveg.s=WebGadget_PageText(0)
Break
EndIf
Until Event = #PB_Event_CloseWindow
EndIf
ProcedureReturn szoveg
EndProcedure
Debug HTML2TEXT ("http://bestbet.site/show.php?show=one", "csv")
Re: HTML2TEXT
Posted: Fri Nov 09, 2018 8:58 pm
by Kwai chang caine
1/ For create a DLL instead of an EXE, in the compiler options, in "executable format" choose "Shared DLL"
2/ For calling the DLL, you have the library
https://www.purebasic.com/documentation ... index.html
And using Callfunction or CallfunctionFast
https://www.purebasic.com/documentation ... ry.pb.html
Or prototypes
https://www.purebasic.com/documentation ... types.html
Re: HTML2TEXT
Posted: Fri Nov 09, 2018 9:10 pm
by incaroad
I've created a dll file but it does not work.
Somewhere I'll wrong it. I'm a very beginner.
How do I rewrite this program to work?
Re: HTML2TEXT
Posted: Sat Nov 10, 2018 11:47 am
by incaroad
Code: Select all
....
ProcedureDLL.s HTML2TEXT (url.s, out.s="txt")
html.s
DeleteFile("oldal.html")
InitNetwork()
URL$=url
ReceiveHTTPFile(URL$,"oldal.html")
ReadFile(0, "oldal.html") ; if the file could be read, we continue...
While Eof(0) = 0 ; loop as long the 'end of file' isn't reached
html= html+ ReadString(0) ; display line by line in the debug window
Wend
CloseFile(0) ; close the previously opened file
DeleteFile("oldal.html")
If out="csv"
hRegex = CreateRegularExpression(#PB_Any, "<\/? ?td ?\/?>", #PB_RegularExpression_NoCase)
html = ReplaceRegularExpression(hRegex, html, ";")
EndIf
OpenFile(1,"oldal.html")
WriteString(1,html)
CloseFile(1)
If OpenWindow(0, 0, 0, 600, 300, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered | #PB_Window_Invisible)
WebGadget(0, 10, 10, 580, 280, "file://"+GetCurrentDirectory() + "oldal.html")
myBrowser.IWebBrowser2 = GetWindowLong_(GadgetID(0), #GWL_USERDATA)
myBrowser\put_Silent(#True)
; Note: if you want to use a local file, change last parameter to "file://" + path + filename
Repeat
Event = WaitWindowEvent()
If GetGadgetAttribute(0,#PB_Web_Busy)=0
Global szoveg.s=WebGadget_PageText(0)
Break
EndIf
DeleteFile("oldal.html")
Until Event = #PB_Event_CloseWindow
EndIf
ProcedureReturn szoveg
EndProcedure
...
The previous code rewritten so I make dll file.
Code: Select all
OpenLibrary(0,"HTML2TEXT.dll")
Prototype.s ProtoFunction(url.s, out.s="txt")
HTML2TEXT.ProtoFunction=GetFunction(0, "HTML2TEXT")
td.s = PeekS(HTML2TEXT("http://bestbet.site/show.php?show=one","csv"))
CloseLibrary(0)
Debug td.s
I'm trying to call it all but never succeed. Where do I break it?
Re: HTML2TEXT
Posted: Sat Nov 10, 2018 1:32 pm
by infratec
Without the full code it is very difficult to help.
One question...
Who closes your window inside the dll ?
Is there a PostEvent ?
Is the variable szoveg Global ?
(Read the help about DLL)
You don't need PeekS() if you use GetFunction()
...
Re: HTML2TEXT
Posted: Sat Nov 10, 2018 2:38 pm
by incaroad
Thanks for Infratec!
Full code in the 4. comment.
I'm trying to fix it.
Re: HTML2TEXT
Posted: Sun Nov 11, 2018 3:02 pm
by drgolf
hello,
works on windows 10 x64 with pb 5.70 LTS B2 :
for the DLL (html2text.dll):
Code: Select all
EnableExplicit
InitNetwork()
;
DataSection
IID_IHTMLDocument2: ; {332C4425-26CB-11D0-B483-00C04FD90119}
Data.l $332C4425
Data.w $26CB, $11D0
Data.b $B4, $83, $00, $C0, $4F, $D9, $01, $19
;
IID_IHTMLDocument3: ; {3050F485-98B5-11CF-BB82-00AA00BDCE0B}
Data.l $3050F485
Data.w $98B5, $11CF
Data.b $BB, $82, $00, $AA, $00, $BD, $CE, $0B
;
IID_NULL: ; {00000000-0000-0000-0000-000000000000}
Data.l $00000000
Data.w $0000, $0000
Data.b $00, $00, $00, $00, $00, $00, $00, $00
EndDataSection
;
Global szoveg.s
;----------------
ProcedureDLL WebGadget_Document(Gadget, *IID)
Protected Document, Browser.iwebbrowser2,DocumentDispatch.idispatch
Browser = GetWindowLong_(GadgetID(Gadget), #GWL_USERDATA)
If Browser
If Browser\get_Document(@DocumentDispatch) = #S_OK And DocumentDispatch
DocumentDispatch\QueryInterface(*IID, @Document)
DocumentDispatch\Release()
EndIf
EndIf
ProcedureReturn Document
EndProcedure
;------------------------------
ProcedureDLL.s WebGadget_PageText(Gadget)
Protected Document.ihtmldocument2, bstr_text, result$, body.ihtmlelement
;Result$ = ""
Document.IHTMLDocument2 = WebGadget_Document(Gadget, ?IID_IHTMLDocument2)
If Document
If Document\get_body(@Body) = #S_OK
If Body\get_innerText(@bstr_text) = #S_OK And bstr_text
Result$ = PeekS(bstr_text, -1, #PB_Unicode)
SysFreeString_(bstr_text)
EndIf
Body\Release()
EndIf
Document\Release()
EndIf
ProcedureReturn Result$
EndProcedure
ProcedureDLL.s HTML2TEXT (url.s, out.s="txt")
Protected html.s, myBrowser.iwebbrowser2,event
;html.s
szoveg=""
;
If FileSize("oldal.html")>0
DeleteFile("oldal.html")
EndIf
;InitNetwork()
;URL$=url
If ReceiveHTTPFile(url,"oldal.html")
If out="csv"
ReadFile(0, "oldal.html") ; if the file could be read, we continue...
While Eof(0) = 0 ; loop as long the 'end of file' isn't reached
html+ ReadString(0) ; display line by line in the debug window
Wend
CloseFile(0) ; close the previously opened file
DeleteFile("oldal.html")
If CreateRegularExpression(0, "<\/? ?td ?\/?>", #PB_RegularExpression_NoCase)
html = ReplaceRegularExpression(0, html, ";")
EndIf
OpenFile(1,"oldal.html")
WriteString(1,html)
CloseFile(1)
EndIf
;
If OpenWindow(0, 0, 0, 600, 300, "WebGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered | #PB_Window_Invisible)
WebGadget(0, 10, 10, 580, 280, "file://"+GetCurrentDirectory() + "oldal.html")
myBrowser = GetWindowLong_(GadgetID(0), #GWL_USERDATA)
myBrowser\put_Silent(#True)
; Note: if you want to use a local file, change last parameter to "file://" + path + filename
Repeat
Event = WaitWindowEvent()
If GetGadgetAttribute(0,#PB_Web_Busy)=0
szoveg=WebGadget_PageText(0)
DeleteFile("oldal.html")
CloseWindow(0)
Break
EndIf
Until Event = #PB_Event_CloseWindow
EndIf
EndIf
ProcedureReturn szoveg
EndProcedure
;Debug HTML2TEXT ("http://bestbet.site/show.php?show=one", "csv")
for testing the DLL :
Code: Select all
Prototype.i ProtoFunction(url.s, out.s="txt")
If OpenLibrary(0,"HTML2TEXT.dll")
HTML2TEXT.ProtoFunction=GetFunction(0, "HTML2TEXT")
td.s = PeekS(HTML2TEXT("http://bestbet.site/show.php?show=one","csv"))
CloseLibrary(0)
EndIf
Debug td.s
Re: HTML2TEXT
Posted: Sun Nov 11, 2018 4:40 pm
by incaroad
Hello!
Thank you very much Drgolf!
You are very professional.