Save full webpage in one HTML file
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Save full webpage in one HTML file
Hello at all,
I have installed the splendid extension "Save page WE" on my Firefox, and she works very fine
This extension create an HTML file with all the images, text, CSS of the webpage a little bit like the Microsoft MHT, but better, because it's a ".html" extension, reading by all the navigators
Is it possible in PB ?
Have you see this style of PB code on the forum ?
Have a good day
I have installed the splendid extension "Save page WE" on my Firefox, and she works very fine
This extension create an HTML file with all the images, text, CSS of the webpage a little bit like the Microsoft MHT, but better, because it's a ".html" extension, reading by all the navigators
Is it possible in PB ?
Have you see this style of PB code on the forum ?
Have a good day
The happiness is a road...
Not a destination
Not a destination
Re: Save full webpage in one HTML file
I suppose technically you want to convert small images in Base64 text HTML ?
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: Save full webpage in one HTML file
Hello OLLI
Yes but not only, i think it's necessary to download all the CSS links and include the source downloaded in place of the link, and like you say, also replace the images links by B64 text
Yes but not only, i think it's necessary to download all the CSS links and include the source downloaded in place of the link, and like you say, also replace the images links by B64 text
The happiness is a road...
Not a destination
Not a destination
Re: Save full webpage in one HTML file
You should create a recursive system which detects all the external links. You talk about CSS : it means nothing for me.
I just know :
Maybe you should start to detect these "a" markups and manage them as nodes.
There is already a problem with this, because CSS can replace such a node. Very complex.
Do you know to embed a Base64 encoded picture in a HTML file? ( if no )
I just know :
Code: Select all
[a href=http://www.femmepoilue.com/chuistropjeune.htm]Dad allowed you?[/a]
There is already a problem with this, because CSS can replace such a node. Very complex.
Do you know to embed a Base64 encoded picture in a HTML file? ( if no )
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: Save full webpage in one HTML file
Thanks a lot for your link, it help me very much
Thanks to you, i have a more (clean) code
Thanks to you, i have a more (clean) code
The happiness is a road...
Not a destination
Not a destination
Re: Save full webpage in one HTML file
Could you extract this text manually and embed it in a forum code markup here ?
(Gohogleux sirche)
https://123doc.net/document/2365574-kho ... am-son.htm
(Gohogleux sirche)
https://123doc.net/document/2365574-kho ... am-son.htm
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: Save full webpage in one HTML file
I believe it's time for you to "1Supo&OLi"(Gohogleux sirche)
https://123doc.net/document/2365574-kho ... am-son.htm
The happiness is a road...
Not a destination
Not a destination
Re: Save full webpage in one HTML file
Thank to Wikipedia
Code: Select all
<img src ="data:image/png;base64,iVBORw0KGgoAAA
ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU
5ErkJggg==" alt= "Red dot" />
Re: Save full webpage in one HTML file
Yes, back when you first discussed it in 2007 -> viewtopic.php?p=215998#p215998Kwai chang caine wrote:Have you see this style of PB code on the forum ?
Did you forget? Hehe. Here's some other links that may help:
viewtopic.php?f=7&t=35772
viewtopic.php?f=13&t=28267
viewtopic.php?f=7&t=41271
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: Save full webpage in one HTML file
@OLLI
Thanks for the help, that can be usefull for include images
But i think the more hard is to include the CSS, and JavaScript links
@BarryG
Hello BarryG
First thanks for your answer
But justly, i not search a code for create MHT, but just HTML
Because, MTH is Microsoft license and all the navigators can not read this proprietary extension without extension
So thanks but the 3 first links is not really powerfull for me
On the other hand, the 4e yes, i not found it yesterday.
At the first regard, it's an begin, i read it more
Again thanks a lot
Thanks for the help, that can be usefull for include images
But i think the more hard is to include the CSS, and JavaScript links
@BarryG
Hello BarryG
First thanks for your answer
Yes i have forgotten (the "power" of old man ), but yesterday, before ask question, i have search in the forum and find it another timeYes, back when you first discussed it in 2007
Did you forget?
But justly, i not search a code for create MHT, but just HTML
Because, MTH is Microsoft license and all the navigators can not read this proprietary extension without extension
So thanks but the 3 first links is not really powerfull for me
On the other hand, the 4e yes, i not found it yesterday.
At the first regard, it's an begin, i read it more
Again thanks a lot
The happiness is a road...
Not a destination
Not a destination
-
- Addict
- Posts: 1233
- Joined: Wed Feb 28, 2007 9:13 am
- Location: London
Re: Save full webpage in one HTML file
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: Save full webpage in one HTML file
Hello Seymour Clufley
Happy to talk with the creator of this nice code, for thank you directly to share it
I have see it, because BarryG have the kindness to give the link of it in the 4e link
I have try to convert it in 5.73 (Add "Modified by KCC")
But apparently the B64 not really execute and the <SCRIPT> converting not really works
For example in this part at random
give a empty tag
Happy to talk with the creator of this nice code, for thank you directly to share it
I have see it, because BarryG have the kindness to give the link of it in the 4e link
I have try to convert it in 5.73 (Add "Modified by KCC")
Code: Select all
Macro R(t)
MessageRequester("Report",t,0)
EndMacro
Global c10.s = Chr(10)
Global c13.s = Chr(13)
Global c32.s = Chr(32)
Global c34.s = Chr(34)
Global c39.s = Chr(39)
Macro EnsureThisNotStart(t,start)
If Left(t,Len(start)) = start
t = Mid(t,Len(start)+1,Len(t))
EndIf
EndMacro
Procedure.s EnsureNotStart(t.s,start.s)
EnsureThisNotStart(t,start)
ProcedureReturn t
EndProcedure
Macro EnsureThisEnd(t,endd)
If endd<>""
If Right(t,Len(endd)) <> endd
t+endd
EndIf
EndIf
EndMacro
Macro EnsureThisNotEnd(t,endd)
If Right(t,Len(endd)) = endd
;snipped.s = Len(t)-Len(endd)
;t = Left(t,snipped)
t = Left(t,Len(t)-Len(endd))
EndIf
EndMacro
Macro StartsWith(main,sub)
(sub<>"" And main<>"" And Left(main,Len(sub))=sub)
EndMacro
Procedure.s FileToString(filename.s)
info.s = ""
file = ReadFile(#PB_Any,filename)
If file
While Not Eof(file)
info + ReadString(file)+c13
Wend
CloseFile(file)
EndIf
ProcedureReturn info
EndProcedure
Procedure.b FileFromString(filename.s,string.s)
;Report("FILE: "+filename+c13+"STRING: "+string)
If filename = "" : ProcedureReturn #False : EndIf ; Modified by KCC
;If Not EnsureFolderPath(GetPathPart(filename)) : ProcedureReturn #False : EndIf
string = RemoveString(string,c10)
If FindString(string,c13,0)
EnsureThisNotEnd(string,c13) ; this removes final linebreak
string = ReplaceString(string,c13,c13+c10)
EndIf
file = CreateFile(#PB_Any,filename)
If IsFile(file)
WriteString(file,string)
CloseFile(file)
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
Procedure.s FileMimeType(filename.s)
mime.s
Select LCase(GetExtensionPart(filename))
Case "png"
mime = "image/png"
Case "jpg", "jpeg", "jpe"
mime = "image/jpeg"
Case "ico"
mime = "image/x-icon"
Case "gif"
mime = "image/gif"
Case "bmp"
mime = "image/bmp"
Case "tif", "tiff"
mime = "image/tiff"
EndSelect
ProcedureReturn mime
EndProcedure
Procedure.s File2Base64(filename.s)
f = OpenFile(#PB_Any,filename)
If Not f : ProcedureReturn "" : EndIf
loaf = Lof(f)
If loaf<1 : CloseFile(f) : ProcedureReturn "" : EndIf
*mem = AllocateMemory(loaf)
lengthread = ReadData(f,*mem,loaf)
CloseFile(f)
If Not lengthread
FreeMemory(*mem)
ProcedureReturn ""
EndIf
; Modified by KCC
; ***************
;bloaf = loaf*1.5
;*b64 = AllocateMemory(bloaf)
;Base64Encoder(*mem,loaf,*b64,bloaf)
;b64.s = PeekS(*b64)
;FreeMemory(*b64)
; ***************
b64.s = Base64Encoder(*mem,loaf)
FreeMemory(*mem)
ProcedureReturn b64
EndProcedure
Procedure.s OmitUnusedClasses(css.s,html.s)
css+c13
html = LCase(html)
ncss.s = ""
html = ReplaceString(html," class='"," class="+c34)
html = ReplaceString(html," id='"," id="+c34)
For a = 1 To CountString(css,c13)
ln.s = StringField(css,a,c13)
ln = Trim(ln)
If ln = "": Continue : EndIf ; Modified by KCC
ln = RemoveString(ln,Chr(9))
use.b = #False
lln.s = LCase(ln)
lln = ReplaceString(lln,"{",c32)
lln = ReplaceString(lln,".",c32)
lln = Trim(lln,c32)
name.s = StringField(lln,1,c32)
;R(name)
If Left(lln,1)="#"
; it's styling for an id
inlinename.s = EnsureNotStart(name,"#")
use = #False
If FindString(html," id="+inlinename,1)
use=#True
Else
If FindString(html," id="+c34+inlinename+c34,1)
use=#True
Else
If FindString(html," id="+c39+inlinename+c39,1)
use=#True
EndIf
EndIf
EndIf
Else
; it's a class
If FindString(html," class="+name,1)
use=#True
Else
If FindString(html," class="+c34+name+c34,1)
use=#True
EndIf
EndIf
EndIf
If use
ncss+ln+c13
EndIf
Next a
ProcedureReturn ncss
EndProcedure
Macro IsPointInsideAScriptBlock(lh,point,var,blockend)
var = #False
nextjsopen = FindString(lh,"<script ",detect)
nextjsclose = FindString(lh,"</script>",detect)
If nextjsclose>0
If nextjsclose<nextjsopen Or Not nextjsopen
;Debug "INSIDE A SCRIPT BLOCK"
var = #True
blockend=nextjsclose
EndIf
EndIf
EndMacro
Macro NewBase(filename)
basecount+1
barrsize = ArraySize(base64s(),1)
If barrsize<basecount
barrsize+5
ReDim base64s(barrsize)
EndIf
base64s(basecount) = "data:"+FileMimeType(filename)+";base64,"+File2Base64(filename)
EndMacro
Macro showtime(t)
Debug RSet(Str(ElapsedMilliseconds()-st),5)+" "+t
EndMacro
Macro B64Signal(b)
"INSERTBASE64_"+Str(b)+"_"
EndMacro
Procedure.b InternaliseHTMLFile(sourcefile.s,destfile.s,webpath.s="",path.s="",omitunusedclasses.b=#True,lookforclassesinjavascript.b=#True,lookforimagesinjavascript.b=#True,extrafile_arr.s="")
st=ElapsedMilliseconds()
; external stylesheets
; external javascripts
; images (converted to base64)
; background-images (converted to base64)
; svgs (converted to base64)
pp.s = "|"
Dim base64s.s(20)
If path
EnsureThisEnd(path,"\")
Else
path = GetPathPart(sourcefile)
EndIf
h.s = FileToString(sourcefile)
lh.s = LCase(h)
nh.s = h
badtagarr.s
importedcss.s
; find all external scripts and get their content...
detect=0
Repeat
detect = FindString(lh,"<script ",detect)
If Not detect : Break : EndIf
enddetect = FindString(lh,">",detect)
If Not enddetect : Break : EndIf
snippet.s = Mid(lh,detect,enddetect-detect)
srcstart = FindString(snippet," src=",1)
If srcstart
snippet = Mid(snippet,srcstart+5,Len(snippet))
;snippet = RemoveString(snippet,"=")
EnsureThisNotEnd(snippet,"/")
snippet = Trim(snippet)
snippet = Trim(snippet,c34)
;EnsureThisNotStart(snippet,c34)
;EnsureThisNotEnd(snippet,c34)
If FindString(snippet,c34,1)
snippet = StringField(snippet,1,c34)
EndIf
; R(snippet)
If snippet
resourcefilename.s = path+ReplaceString(snippet,"/","\")
newjs.s = FileToString(resourcefilename)
If newjs = "": detect=enddetect : Continue : EndIf ; Modified by KCC
newjs = Trim(newjs,c13)
tag.s = Mid(h,detect,enddetect-detect+1)
If newjs
ntag.s = RemoveString(tag,snippet,#PB_String_NoCase)
ntag = RemoveString(ntag,c32+"src="+c34+c34)
ntag = RemoveString(ntag,c32+"src=")
ntag+c13+newjs+c13
nh = ReplaceString(nh,tag,ntag)
Else
; empty js file
nh = RemoveString(nh,tag+"</SCRIPT>",#PB_String_NoCase)
nh = RemoveString(nh,tag+" </SCRIPT>",#PB_String_NoCase)
nh = RemoveString(nh,tag+c13+"</SCRIPT>",#PB_String_NoCase)
EndIf
EndIf
EndIf
detect=enddetect
ForEver
; find all external stylesheets and get their content...
detect=0
Repeat
; <LINK rel="stylesheet" type="text/css" href="styles.css" />
detect = FindString(lh,"<link ",detect)
If Not detect : Break : EndIf
enddetect = FindString(lh,">",detect)
If Not enddetect : Break : EndIf
snippet.s = Mid(lh,detect,enddetect-detect)
If FindString(snippet,"stylesheet",1)
srcstart = FindString(snippet," href=",1)
If srcstart
snippet = Mid(snippet,srcstart+5,Len(snippet))
snippet = RemoveString(snippet,"=")
EnsureThisNotEnd(snippet,"/")
snippet = Trim(snippet)
EnsureThisNotStart(snippet,c34)
EnsureThisNotEnd(snippet,c34)
If snippet
resourcefilename.s = path+ReplaceString(snippet,"/","\")
newcss.s = FileToString(resourcefilename)
newcss = Trim(newcss,c13)
If newcss
importedcss+c13+c13+newcss
EndIf
badtagarr+ Mid(h,detect,enddetect-detect+1) +pp
EndIf
EndIf
EndIf
detect=enddetect
ForEver
If importedcss
importedcss = Trim(importedcss,c13)
EndIf
; find all external images and convert them to base64...
detect=0
Repeat
; <IMG src="tree.jpg" />
detect = FindString(lh,"<img ",detect)
If Not detect : Break : EndIf
enddetect = FindString(lh,">",detect)
If Not enddetect : Break : EndIf
snippet.s = Mid(lh,detect,enddetect-detect)
srcstart = FindString(snippet," src=",1)
If srcstart
;snippet1.s=snippet
snippet = Mid(snippet,srcstart+4,Len(snippet))
snippet = RemoveString(snippet,"=")
EnsureThisNotEnd(snippet,"/")
snippet = Trim(snippet)
;EnsureThisNotStart(snippet,c34)
;EnsureThisNotEnd(snippet,c34)
snippet=Trim(snippet,c39) : snippet=Trim(snippet,c34)
;snippet2.s = snippet
If snippet
If FindString(snippet,c34,1) : snippet=StringField(snippet,1,c34) : EndIf
If FindString(snippet,c39,1) : snippet=StringField(snippet,1,c39) : EndIf
resourcefilename.s = path+ReplaceString(snippet,"/","\")
If FindString(resourcefilename,"?",1) : resourcefilename = StringField(resourcefilename,1,"?") : EndIf
;If FileSize(resourcefilename)<0
; R("NO FILE."+c13+c13+snippet1+c13+snippet2+c13+snippet+c13+resourcefilename)
;EndIf
NewBase(resourcefilename)
tag.s = Mid(h,detect,enddetect-detect+1)
ntag.s = ReplaceString(tag,snippet,B64Signal(basecount))
nh = ReplaceString(nh,tag,ntag)
EndIf
EndIf
detect=enddetect
ForEver
; find all external SVGs and convert them to base64...
If FindString(lh,".svg",1)
detect=0
Repeat
; <OBJECT data="arcrap.svg" type="image/svg+xml" width="100%" height="100%"> </OBJECT>
detect = FindString(lh,"<object ",detect)
If Not detect : Break : EndIf
enddetect = FindString(lh,">",detect)
If Not enddetect : Break : EndIf
snippet.s = Mid(lh,detect,enddetect-detect)
srcstart = FindString(snippet," data=",1)
If srcstart
snippet = Mid(snippet,srcstart+5,Len(snippet))
EnsureThisNotStart(snippet,"=")
EnsureThisNotEnd(snippet,"/")
EnsureThisNotStart(snippet,c34)
If Left(snippet,1)=c34
EnsureThisNotStart(snippet,c34)
snippet = StringField(snippet,1,c34)
Else
snippet = StringField(snippet,1,c32)
EndIf
snippet = Trim(snippet)
EnsureThisNotEnd(snippet,c34)
;MessageRequester("SNIPPET","**"+snippet+"**")
If snippet
resourcefilename.s = path+ReplaceString(snippet,"/","\")
;MessageRequester("BASE 64",b64)
NewBase(resourcefilename)
tag.s = Mid(h,detect,enddetect-detect+1)
ntag.s = ReplaceString(tag,snippet,B64Signal(basecount))
nh = ReplaceString(nh,tag,ntag)
EndIf
EndIf
detect=enddetect
ForEver
EndIf
If badtagarr
For a = 1 To CountString(badtagarr,pp)
tag.s = StringField(badtagarr,a,pp)
nh = RemoveString(nh,tag)
Next a
EndIf
;showtime("after badtagarr section")
swb64.s = "SwitchBase64"
If lookforimagesinjavascript
attrname.s = "src"
For cycle = 1 To 2
elsrc.s = "."+attrname+"="
nh = ReplaceString(nh,"."+attrname+" =",elsrc)
nh = ReplaceString(nh,"."+attrname+"= ",elsrc)
detect = 0
enddetect = 0
;R("ELSRC: "+elsrc)
Repeat
lh.s = LCase(nh)
detect = FindString(lh,elsrc,detect)
If Not detect : Break : EndIf
detect+Len(elsrc)
enddetect = FindString(lh,";",detect)
snippet.s = Mid(nh,detect,enddetect-detect)
If StartsWith(snippet,swb64)
detect = enddetect
Continue
EndIf
;R(snippet)
newinstrux.s = elsrc+swb64+"("+snippet+")"
;R(elsrc+snippet+c13+c13+newinstrux)
nh = ReplaceString(nh,elsrc+snippet,newinstrux,#PB_String_NoCase)
nh = ReplaceString(nh,elsrc+" "+snippet,newinstrux,#PB_String_NoCase)
detect = enddetect
snippet = Trim(snippet,c34)
snippet = Trim(snippet,c39)
;R(path+snippet)
If FileSize(path+snippet)>0 ; it's a filename (otherwise it's a variable or a js function call)
;R("FILE FOUND IN BASE FOLDER")
If Not FindString(pp+extrafile_arr,pp+snippet+pp,1) ; add it to the list if it's not already there
extrafile_arr+snippet+pp
EndIf
EndIf
ForEver
attrname = "backgroundImage"
Next cycle
EndIf
ncss.s = importedcss
; now get any internal style blocks...
lh.s = LCase(nh)
If FindString(lh,"<style",1)
nh = ReplaceString(nh,"<style","<STYLE")
nh = ReplaceString(nh,"</style>","</STYLE>")
Repeat
detect = FindString(nh,"<STYLE",0)
If Not detect : Break : EndIf
enddetect = FindString(nh,"</STYLE>",detect+6)
If Not enddetect : Break : EndIf
snippet.s = Mid(nh,detect,enddetect-detect+8)
nh = RemoveString(nh,snippet)
EnsureThisNotEnd(snippet,"</STYLE>")
detect = FindString(snippet,">",1)
snippet = Mid(snippet,detect+1,Len(snippet))
snippet = RemoveString(snippet,Chr(9))
ncss+Trim(snippet,c13)+c13
ForEver
EndIf
;showtime("after getting style blocks")
; omit unused...
If ncss
If omitunusedclasses
ncss = OmitUnusedClasses(ncss,nh)
EndIf
EndIf
Debug(Str(Len(nh)))
If webpath ; takes ages!
EnsureThisEnd(webpath,"/")
webpathl = Len(webpath)
lh.s = LCase(nh)
detect=FindString(lh,"<body",1)
;R(Mid(lh,detect,100))
Repeat
; <A id="example" href="page2.html">text</A>
detect = FindString(nh," href=",detect)
If Not detect : Break : EndIf
IsPointInsideAScriptBlock(lh,detect,inside,blockend)
If inside
;R("INSIDE SCRIPT. BLOCK ENDS @ "+Str(blockend))
detect = blockend
Continue
EndIf
offset=7
snippet = Mid(nh,detect+offset,webpathl+1)
;R(snippet)
If Left(snippet,1)=c34
offset=8
snippet = Mid(snippet,detect+offset,webpathl+1)
EndIf
If StartsWith(snippet,"javascript:") Or StartsWith(snippet,"http://") Or StartsWith(snippet,"#") Or StartsWith(snippet,webpath)
detect+webpathl
Continue
EndIf
;If Not snippet
; detect+webpathl
; Continue
;EndIf
;R(Mid(nh,detect+place-2,40))
nh = InsertString(nh,webpath,detect+offset)
detect+webpathl
ForEver
EndIf
; incorporate single css block...
If ncss
ncss = "<STYLE type="+c34+"text/css"+c34+">"+c13+c13+ncss+c13+c13+"</STYLE>"
headend = FindString(nh,"<HEAD>",1)
nh = InsertString(nh,c13+ncss,headend+7)
EndIf
; find css images and base64 them...
bi.s = "background-image:"
If FindString(lh,bi,1)
detect = 0
Repeat
; { width:50px; background-image:url('graphics/example.jpg'); display:block; }"
lh.s = LCase(nh)
detect = FindString(lh,bi,detect)
If Not detect : Break : EndIf
detect+Len(bi)
enddetect = FindString(nh,";",detect)
If Not enddetect : Break : EndIf
snippet.s = Mid(nh,detect,enddetect-detect)
entiresnippet.s = snippet
EnsureThisNotStart(snippet,"url(")
EnsureThisNotEnd(snippet,")")
snippet = Trim(snippet) : snippet=Trim(snippet,c39) : snippet=Trim(snippet,c34)
If Left(snippet,5)="data:" : detect=enddetect : Continue : EndIf
If snippet
If FindString(snippet,c34,1) : snippet=StringField(snippet,1,c34) : EndIf
If FindString(snippet,c39,1) : snippet=StringField(snippet,1,c39) : EndIf
resourcefilename.s = path+ReplaceString(snippet,"/","\")
;R(resourcefilename)
If FindString(resourcefilename,"?",1) : resourcefilename = StringField(resourcefilename,1,"?") : EndIf
;If FileSize(resourcefilename)<0
; R("NO FILE."+c13+c13+snippet1+c13+snippet2+c13+snippet+c13+resourcefilename)
;EndIf
NewBase(resourcefilename)
ntag.s = ReplaceString(entiresnippet,snippet,B64Signal(basecount))
nh = ReplaceString(nh,entiresnippet,ntag)
EndIf
detect = enddetect
ForEver
EndIf
If extrafile_arr
;R("EXTRA FILE ARR"+c13+c13+ReplaceString(extrafile_arr,pp,c13))
; we found some discernible filenames in the js, and/or some supplied to the procedure
; have to encode them and construct the js switching procedure, SwitchBase64
switcherfunc.s = "function SwitchBase64(filename) {"+c13
;switcherfunc + "alert(filename);"
switcherfunc + " switch(filename) {"+c13
For a = 1 To CountString(extrafile_arr,pp)
resfile.s = StringField(extrafile_arr,a,pp)
EnsureThisNotStart(resfile,path)
If resfile = "" : Continue : EndIf ; Modified by KCC
NewBase(path+resfile)
;R(b64)
switcherfunc + " case "+c34+ReplaceString(resfile,"\","/")+c34+":"+c13
switcherfunc + " return "+c34+B64Signal(basecount)+c34+";"+c13
Next a
switcherfunc+c13+" }"+c13+" }"
; now insert it right before the closing HEAD tag
switcherfunc.s = "<SCRIPT type="+c34+"text/javascript"+c34+">"+c13+switcherfunc+c13+"</SCRIPT>"+c13
;nh = ReplaceString(nh,"</HEAD>","</HEAD>"+switcherfunc,#PB_String_NoCase)
headend = FindString(nh,"</HEAD>",1)
nh = InsertString(nh,switcherfunc,headend-1)
EndIf
detect=0
For b = 1 To basecount
marker.s = B64Signal(b)
;nh = ReplaceString(nh,marker,base64s(b)) : Continue
ndetect = FindString(nh,marker,0)
If ndetect
detect=ndetect
nh = ReplaceString(nh,marker,base64s(b),0,detect-2)
EndIf
;detect = FindString(nh,marker,0)
;If detect
; p1.s = Left(nh,detect-1)
; p2.s = Mid(nh,detect+Len(marker),Len(nh))
; nh = p1+base64s(b)+p2
;EndIf
Next b
FileFromString(destfile,nh)
EndProcedure
; Adding by KCC
; *************
InitNetwork()
ReceiveHTTPFile("https://www.amazon.fr/s?k=purebasic&__mk_fr_FR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss", "AmazonSearchPB.html")
InternaliseHTMLFile("AmazonSearchPB.html","AmazonSearchPB_Internalise.html")
For example in this part at random
Code: Select all
</script>
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/01mI9NDJJTL._RC|41H+PEIOYHL.css,51CKwX+QTdL.css_.css?AUIClients/SearchAssets&nZMUiA5H#270238-T1.308335-T1" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/01BLqKISyaL._RC|01+neHskhqL.css,01mfj61BPYL.css,01Q4qmee9LL.css,11kdhabA0xL.css,01Y5FkF5TkL.css,0171-O+nBwL.css,2170Ev7c3lL.css,21rhNT4WPrL.css,01rZTK48+KL.css,21SHQ2IVatL.css,21nAJBNu4CL.css,01rdVnPkgmL.css,01ixfc-7StL.css,21URXAbTuFL.css,01Op2rWArIL.css,01hQwBUjIwL.css_.css?AUIClients/SearchPartnerAssets" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/31-wGuUNxVL.css?AUIClients/DetailPageAllOffersDisplayAssets&GSArOGBt#323159-T2.323160-T2" />
<script>
Code: Select all
</script>
<script>
The happiness is a road...
Not a destination
Not a destination
-
- Addict
- Posts: 1233
- Joined: Wed Feb 28, 2007 9:13 am
- Location: London
Re: Save full webpage in one HTML file
Yes, you'll need to upgrade the Base64 code to work with PB's new method.
The LINK tags might be disappearing because the CSS from them is being absorbed into the main file. That's what should be happening, after all. The idea is that the output HTML file will not rely on any external resources.
The LINK tags might be disappearing because the CSS from them is being absorbed into the main file. That's what should be happening, after all. The idea is that the output HTML file will not rely on any external resources.
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: Save full webpage in one HTML file
Yes you talk about this part of code or there are another to modify ?Yes, you'll need to upgrade the Base64 code to work with PB's new method.
Code: Select all
Procedure.s File2Base64(filename.s)
f = OpenFile(#PB_Any,filename)
If Not f : ProcedureReturn "" : EndIf
loaf = Lof(f)
If loaf<1 : CloseFile(f) : ProcedureReturn "" : EndIf
*mem = AllocateMemory(loaf)
lengthread = ReadData(f,*mem,loaf)
CloseFile(f)
If Not lengthread
FreeMemory(*mem)
ProcedureReturn ""
EndIf
; Modified by KCC
; ***************
;bloaf = loaf*1.5
;*b64 = AllocateMemory(bloaf)
;Base64Encoder(*mem,loaf,*b64,bloaf)
;b64.s = PeekS(*b64)
;FreeMemory(*b64)
; ***************
b64.s = Base64Encoder(*mem,loaf)
FreeMemory(*mem)
ProcedureReturn b64
EndProcedure
Yes too, but the problem is that apparently the links disappears, but he is not replacing by the content of the CSS, see yourself with a text comparerThe LINK tags might be disappearing because the CSS from them is being absorbed into the main file. That's what should be happening, after all
Yes, it's exactely what i search to do, and i'm happy you do this code before meThe idea is that the output HTML file will not rely on any external resources.
But it's strange nobody or nearly are interesting by save a full page in only one file
Crosoft had this idea with MHT there are a long time ago, but like often, what microsoft do, it's for microsoft or nearly
The happiness is a road...
Not a destination
Not a destination
-
- Addict
- Posts: 1233
- Joined: Wed Feb 28, 2007 9:13 am
- Location: London
Re: Save full webpage in one HTML file
You should debug the code and find out whether it is successfully downloading these files:
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com ... .308335-T1" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com ... tnerAssets" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com ... .323160-T2" />
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."