Save full webpage in one HTML file

Just starting out? Need help? Post your questions and find answers here.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5357
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Save full webpage in one HTML file

Post by Kwai chang caine »

Hello at all,

I have installed the splendid extension "Save page WE" on my Firefox, and she works very fine
This extension create an HTML file with all the images, text, CSS of the webpage :shock: a little bit like the Microsoft MHT, but better, because it's a ".html" extension, reading by all the navigators 8)

Is it possible in PB ?
Have you see this style of PB code on the forum ?

Have a good day
ImageThe happiness is a road...
Not a destination
Olli
Addict
Addict
Posts: 1071
Joined: Wed May 27, 2020 12:26 pm

Re: Save full webpage in one HTML file

Post by Olli »

I suppose technically you want to convert small images in Base64 text HTML ?
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5357
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Save full webpage in one HTML file

Post by Kwai chang caine »

Hello OLLI :wink:

Yes but not only, i think it's necessary to download all the CSS links and include the source downloaded in place of the link, and like you say, also replace the images links by B64 text :idea:
ImageThe happiness is a road...
Not a destination
Olli
Addict
Addict
Posts: 1071
Joined: Wed May 27, 2020 12:26 pm

Re: Save full webpage in one HTML file

Post by Olli »

You should create a recursive system which detects all the external links. You talk about CSS : it means nothing for me.
I just know :

Code: Select all

[a href=http://www.femmepoilue.com/chuistropjeune.htm]Dad allowed you?[/a]
Maybe you should start to detect these "a" markups and manage them as nodes.

There is already a problem with this, because CSS can replace such a node. Very complex.

Do you know to embed a Base64 encoded picture in a HTML file? ( if no )
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5357
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Save full webpage in one HTML file

Post by Kwai chang caine »

Thanks a lot for your link, it help me very much 8)
Thanks to you, i have a more (clean) code :mrgreen:
ImageThe happiness is a road...
Not a destination
Olli
Addict
Addict
Posts: 1071
Joined: Wed May 27, 2020 12:26 pm

Re: Save full webpage in one HTML file

Post by Olli »

Could you extract this text manually and embed it in a forum code markup here ?

(Gohogleux sirche)
https://123doc.net/document/2365574-kho ... am-son.htm
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5357
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Save full webpage in one HTML file

Post by Kwai chang caine »

I believe it's time for you to "1Supo&OLi" :lol:
ImageThe happiness is a road...
Not a destination
Olli
Addict
Addict
Posts: 1071
Joined: Wed May 27, 2020 12:26 pm

Re: Save full webpage in one HTML file

Post by Olli »

Thank to Wikipedia

Code: Select all

<img src ="data:image/png;base64,iVBORw0KGgoAAA
ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU
5ErkJggg==" alt= "Red dot" />
BarryG
Addict
Addict
Posts: 3331
Joined: Thu Apr 18, 2019 8:17 am

Re: Save full webpage in one HTML file

Post by BarryG »

Kwai chang caine wrote:Have you see this style of PB code on the forum ?
Yes, back when you first discussed it in 2007 -> viewtopic.php?p=215998#p215998

Did you forget? Hehe. Here's some other links that may help:

viewtopic.php?f=7&t=35772
viewtopic.php?f=13&t=28267
viewtopic.php?f=7&t=41271
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5357
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Save full webpage in one HTML file

Post by Kwai chang caine »

@OLLI
Thanks for the help, that can be usefull for include images 8)
But i think the more hard is to include the CSS, and JavaScript links :|

@BarryG
Hello BarryG :D

First thanks for your answer 8)
Yes, back when you first discussed it in 2007
Did you forget?
Yes i have forgotten (the "power" of old man :oops: ), but yesterday, before ask question, i have search in the forum and find it another time
But justly, i not search a code for create MHT, but just HTML :wink:
Because, MTH is Microsoft license and all the navigators can not read this proprietary extension without extension :|

So thanks but the 3 first links is not really powerfull for me :wink:
On the other hand, the 4e yes, i not found it yesterday.
At the first regard, it's an begin, i read it more :D

Again thanks a lot 8)
ImageThe happiness is a road...
Not a destination
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: Save full webpage in one HTML file

Post by Seymour Clufley »

JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5357
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Save full webpage in one HTML file

Post by Kwai chang caine »

Hello Seymour Clufley :D

Happy to talk with the creator of this nice code, for thank you directly to share it 8)

I have see it, because BarryG have the kindness to give the link of it in the 4e link :wink:

I have try to convert it in 5.73 (Add "Modified by KCC")

Code: Select all

Macro R(t)
  MessageRequester("Report",t,0)
EndMacro

Global c10.s = Chr(10)
Global c13.s = Chr(13)
Global c32.s = Chr(32)
Global c34.s = Chr(34)
Global c39.s = Chr(39)

Macro EnsureThisNotStart(t,start)
 
  If Left(t,Len(start)) = start
      t = Mid(t,Len(start)+1,Len(t))
  EndIf
 
EndMacro

Procedure.s EnsureNotStart(t.s,start.s)
 
  EnsureThisNotStart(t,start)
  ProcedureReturn t
 
EndProcedure

Macro EnsureThisEnd(t,endd)
 
  If endd<>""
      If Right(t,Len(endd)) <> endd
          t+endd
      EndIf
  EndIf
 
EndMacro

Macro EnsureThisNotEnd(t,endd)
 
  If Right(t,Len(endd)) = endd
      ;snipped.s = Len(t)-Len(endd)
      ;t = Left(t,snipped)
      t = Left(t,Len(t)-Len(endd))
  EndIf
 
EndMacro

Macro StartsWith(main,sub)
  (sub<>"" And main<>"" And Left(main,Len(sub))=sub)
EndMacro


Procedure.s FileToString(filename.s)
 
  info.s = ""
  file = ReadFile(#PB_Any,filename)
  If file
      While Not Eof(file)
          info + ReadString(file)+c13
      Wend
      CloseFile(file)
  EndIf
 
  ProcedureReturn info
 
EndProcedure

Procedure.b FileFromString(filename.s,string.s)
  ;Report("FILE: "+filename+c13+"STRING: "+string)
 
  If filename = "" : ProcedureReturn #False : EndIf   ; Modified by KCC
  ;If Not EnsureFolderPath(GetPathPart(filename)) : ProcedureReturn #False : EndIf
 
  string = RemoveString(string,c10)
  If FindString(string,c13,0)
      EnsureThisNotEnd(string,c13) ; this removes final linebreak
      string = ReplaceString(string,c13,c13+c10)
  EndIf
 
  file = CreateFile(#PB_Any,filename)
  If IsFile(file)
      WriteString(file,string)
      CloseFile(file)
      ProcedureReturn #True
  EndIf
 
  ProcedureReturn #False
 
EndProcedure


Procedure.s FileMimeType(filename.s)
 
  mime.s
  Select LCase(GetExtensionPart(filename))
      Case "png"
          mime = "image/png"
      Case "jpg", "jpeg", "jpe"
          mime = "image/jpeg"
      Case "ico"
          mime = "image/x-icon"
      Case "gif"
          mime = "image/gif"
      Case "bmp"
          mime = "image/bmp"
      Case "tif", "tiff"
          mime = "image/tiff"
  EndSelect
 
  ProcedureReturn mime
 
EndProcedure

Procedure.s File2Base64(filename.s)
 
  f = OpenFile(#PB_Any,filename)
  If Not f : ProcedureReturn "" : EndIf
  loaf = Lof(f)
  If loaf<1 : CloseFile(f) : ProcedureReturn "" : EndIf
  *mem = AllocateMemory(loaf)
  lengthread = ReadData(f,*mem,loaf)
  CloseFile(f)
  If Not lengthread
      FreeMemory(*mem)
      ProcedureReturn ""
  EndIf
 
  ; Modified by KCC
  ; ***************
  ;bloaf = loaf*1.5
  ;*b64 = AllocateMemory(bloaf)
  ;Base64Encoder(*mem,loaf,*b64,bloaf)
  ;b64.s = PeekS(*b64)
  ;FreeMemory(*b64)
  
  ; ***************
  
  b64.s = Base64Encoder(*mem,loaf)
  FreeMemory(*mem)
  
  ProcedureReturn b64
 
EndProcedure


Procedure.s OmitUnusedClasses(css.s,html.s)
 
  css+c13
  html = LCase(html)
  ncss.s = ""
  html = ReplaceString(html," class='"," class="+c34)
  html = ReplaceString(html," id='"," id="+c34)
 
  For a = 1 To CountString(css,c13)
      ln.s = StringField(css,a,c13)
      ln = Trim(ln)
      If ln = "": Continue : EndIf   ; Modified by KCC
      ln = RemoveString(ln,Chr(9))
      use.b = #False
      lln.s = LCase(ln)
      lln = ReplaceString(lln,"{",c32)
      lln = ReplaceString(lln,".",c32)
      lln = Trim(lln,c32)
      name.s = StringField(lln,1,c32)
      ;R(name)
      If Left(lln,1)="#"
          ; it's styling for an id
          inlinename.s = EnsureNotStart(name,"#")
          use = #False
          If FindString(html," id="+inlinename,1)
              use=#True
          Else
              If FindString(html," id="+c34+inlinename+c34,1)
                  use=#True
              Else
                  If FindString(html," id="+c39+inlinename+c39,1)
                      use=#True
                  EndIf
              EndIf
          EndIf
      Else
          ; it's a class
          If FindString(html," class="+name,1)
              use=#True
          Else
              If FindString(html," class="+c34+name+c34,1)
                  use=#True
              EndIf
          EndIf
      EndIf
     
      If use
          ncss+ln+c13
      EndIf
  Next a
 
  ProcedureReturn ncss
 
EndProcedure

Macro IsPointInsideAScriptBlock(lh,point,var,blockend)
  var = #False
  nextjsopen = FindString(lh,"<script ",detect)
  nextjsclose = FindString(lh,"</script>",detect)
  If nextjsclose>0
      If nextjsclose<nextjsopen Or Not nextjsopen
          ;Debug "INSIDE A SCRIPT BLOCK"
          var = #True
          blockend=nextjsclose
      EndIf
  EndIf
EndMacro




Macro NewBase(filename)
  basecount+1
  barrsize = ArraySize(base64s(),1)
  If barrsize<basecount
      barrsize+5
      ReDim base64s(barrsize)
  EndIf
  base64s(basecount) = "data:"+FileMimeType(filename)+";base64,"+File2Base64(filename)
EndMacro

Macro showtime(t)
  Debug RSet(Str(ElapsedMilliseconds()-st),5)+"  "+t
EndMacro

Macro B64Signal(b)
  "INSERTBASE64_"+Str(b)+"_"
EndMacro

Procedure.b InternaliseHTMLFile(sourcefile.s,destfile.s,webpath.s="",path.s="",omitunusedclasses.b=#True,lookforclassesinjavascript.b=#True,lookforimagesinjavascript.b=#True,extrafile_arr.s="")
 
  st=ElapsedMilliseconds()
 
  ; external stylesheets
  ; external javascripts
  ; images (converted to base64)
  ; background-images (converted to base64)
  ; svgs (converted to base64)
 
  pp.s = "|"
  Dim base64s.s(20)
 
  If path
      EnsureThisEnd(path,"\")
  Else
      path = GetPathPart(sourcefile)
  EndIf
 
  h.s = FileToString(sourcefile)
 
  lh.s = LCase(h)
  nh.s = h
 
  badtagarr.s
 
  importedcss.s
 
 
 
  ; find all external scripts and get their content...
  detect=0
  Repeat
      detect = FindString(lh,"<script ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      srcstart = FindString(snippet," src=",1)
      If srcstart
          snippet = Mid(snippet,srcstart+5,Len(snippet))
          ;snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          snippet = Trim(snippet,c34)
          ;EnsureThisNotStart(snippet,c34)
          ;EnsureThisNotEnd(snippet,c34)
          If FindString(snippet,c34,1)
              snippet = StringField(snippet,1,c34)
          EndIf
;          R(snippet)
          If snippet
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              newjs.s = FileToString(resourcefilename)
              If newjs = "": detect=enddetect : Continue : EndIf   ; Modified by KCC
              newjs = Trim(newjs,c13)
              tag.s = Mid(h,detect,enddetect-detect+1)
              If newjs
                  ntag.s = RemoveString(tag,snippet,#PB_String_NoCase)
                  ntag = RemoveString(ntag,c32+"src="+c34+c34)
                  ntag = RemoveString(ntag,c32+"src=")
                  ntag+c13+newjs+c13
                  nh = ReplaceString(nh,tag,ntag)
              Else
                  ; empty js file
                  nh = RemoveString(nh,tag+"</SCRIPT>",#PB_String_NoCase)
                  nh = RemoveString(nh,tag+" </SCRIPT>",#PB_String_NoCase)
                  nh = RemoveString(nh,tag+c13+"</SCRIPT>",#PB_String_NoCase)
              EndIf
          EndIf
      EndIf
      detect=enddetect
  ForEver
 
 
 
  ; find all external stylesheets and get their content...
  detect=0
  Repeat
      ; <LINK rel="stylesheet" type="text/css" href="styles.css" />
      detect = FindString(lh,"<link ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      If FindString(snippet,"stylesheet",1)
      srcstart = FindString(snippet," href=",1)
      If srcstart
          snippet = Mid(snippet,srcstart+5,Len(snippet))
          snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          EnsureThisNotStart(snippet,c34)
          EnsureThisNotEnd(snippet,c34)
          If snippet
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              newcss.s = FileToString(resourcefilename)
              newcss = Trim(newcss,c13)
              If newcss
                  importedcss+c13+c13+newcss
              EndIf
              badtagarr+ Mid(h,detect,enddetect-detect+1) +pp
          EndIf
      EndIf
      EndIf
      detect=enddetect
  ForEver
  If importedcss
      importedcss = Trim(importedcss,c13)
  EndIf
 
 
 
  ; find all external images and convert them to base64...
  detect=0
  Repeat
      ; <IMG src="tree.jpg" />
      detect = FindString(lh,"<img ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      srcstart = FindString(snippet," src=",1)
      If srcstart
          ;snippet1.s=snippet
          snippet = Mid(snippet,srcstart+4,Len(snippet))
          snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          ;EnsureThisNotStart(snippet,c34)
          ;EnsureThisNotEnd(snippet,c34)
          snippet=Trim(snippet,c39) : snippet=Trim(snippet,c34)
          ;snippet2.s = snippet
          If snippet
              If FindString(snippet,c34,1) : snippet=StringField(snippet,1,c34) : EndIf
              If FindString(snippet,c39,1) : snippet=StringField(snippet,1,c39) : EndIf
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              If FindString(resourcefilename,"?",1) : resourcefilename = StringField(resourcefilename,1,"?") : EndIf
              ;If FileSize(resourcefilename)<0
              ;    R("NO FILE."+c13+c13+snippet1+c13+snippet2+c13+snippet+c13+resourcefilename)
              ;EndIf
              NewBase(resourcefilename)
              tag.s = Mid(h,detect,enddetect-detect+1)
              ntag.s = ReplaceString(tag,snippet,B64Signal(basecount))
              nh = ReplaceString(nh,tag,ntag)
          EndIf
      EndIf
      detect=enddetect
  ForEver
 
 
 
  ; find all external SVGs and convert them to base64...
  If FindString(lh,".svg",1)
      detect=0
      Repeat
          ; <OBJECT data="arcrap.svg" type="image/svg+xml" width="100%" height="100%">    </OBJECT>
          detect = FindString(lh,"<object ",detect)
          If Not detect : Break : EndIf
          enddetect = FindString(lh,">",detect)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(lh,detect,enddetect-detect)
          srcstart = FindString(snippet," data=",1)
          If srcstart
              snippet = Mid(snippet,srcstart+5,Len(snippet))
              EnsureThisNotStart(snippet,"=")
              EnsureThisNotEnd(snippet,"/")
              EnsureThisNotStart(snippet,c34)
              If Left(snippet,1)=c34
                  EnsureThisNotStart(snippet,c34)
                  snippet = StringField(snippet,1,c34)
              Else
                  snippet = StringField(snippet,1,c32)
              EndIf
              snippet = Trim(snippet)
              EnsureThisNotEnd(snippet,c34)
              ;MessageRequester("SNIPPET","**"+snippet+"**")
              If snippet
                  resourcefilename.s = path+ReplaceString(snippet,"/","\")
                  ;MessageRequester("BASE 64",b64)
                  NewBase(resourcefilename)
                  tag.s = Mid(h,detect,enddetect-detect+1)
                  ntag.s = ReplaceString(tag,snippet,B64Signal(basecount))
                  nh = ReplaceString(nh,tag,ntag)
              EndIf
          EndIf
          detect=enddetect
      ForEver
  EndIf
 
 
  If badtagarr
      For a = 1 To CountString(badtagarr,pp)
          tag.s = StringField(badtagarr,a,pp)
          nh = RemoveString(nh,tag)
      Next a
  EndIf
  ;showtime("after badtagarr section")
 
 
  swb64.s = "SwitchBase64"
  If lookforimagesinjavascript
      attrname.s = "src"
      For cycle = 1 To 2
          elsrc.s = "."+attrname+"="
          nh = ReplaceString(nh,"."+attrname+" =",elsrc)
          nh = ReplaceString(nh,"."+attrname+"= ",elsrc)
          detect = 0
          enddetect = 0
          ;R("ELSRC: "+elsrc)
          Repeat
              lh.s = LCase(nh)
              detect = FindString(lh,elsrc,detect)
              If Not detect : Break : EndIf
              detect+Len(elsrc)
              enddetect = FindString(lh,";",detect)
              snippet.s = Mid(nh,detect,enddetect-detect)
              If StartsWith(snippet,swb64)
                  detect = enddetect
                  Continue
              EndIf
              ;R(snippet)
              newinstrux.s = elsrc+swb64+"("+snippet+")"
              ;R(elsrc+snippet+c13+c13+newinstrux)
              nh = ReplaceString(nh,elsrc+snippet,newinstrux,#PB_String_NoCase)
              nh = ReplaceString(nh,elsrc+" "+snippet,newinstrux,#PB_String_NoCase)
              detect = enddetect
             
              snippet = Trim(snippet,c34)
              snippet = Trim(snippet,c39)
              ;R(path+snippet)
              If FileSize(path+snippet)>0 ; it's a filename (otherwise it's a variable or a js function call)
                  ;R("FILE FOUND IN BASE FOLDER")
                  If Not FindString(pp+extrafile_arr,pp+snippet+pp,1) ; add it to the list if it's not already there
                      extrafile_arr+snippet+pp
                  EndIf
              EndIf
             
          ForEver
          attrname = "backgroundImage"
      Next cycle
  EndIf
 
 
 
  ncss.s = importedcss
  ; now get any internal style blocks...
  lh.s = LCase(nh)
  If FindString(lh,"<style",1)
      nh = ReplaceString(nh,"<style","<STYLE")
      nh = ReplaceString(nh,"</style>","</STYLE>")
      Repeat
          detect = FindString(nh,"<STYLE",0)
          If Not detect : Break : EndIf
          enddetect = FindString(nh,"</STYLE>",detect+6)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(nh,detect,enddetect-detect+8)
          nh = RemoveString(nh,snippet)
          EnsureThisNotEnd(snippet,"</STYLE>")
          detect = FindString(snippet,">",1)
          snippet = Mid(snippet,detect+1,Len(snippet))
          snippet = RemoveString(snippet,Chr(9))
         
          ncss+Trim(snippet,c13)+c13
      ForEver
  EndIf
  ;showtime("after getting style blocks")
 
  ; omit unused...
  If ncss
      If omitunusedclasses
          ncss = OmitUnusedClasses(ncss,nh)
      EndIf
  EndIf
 
 
 
  Debug(Str(Len(nh)))
  If webpath ; takes ages!
      EnsureThisEnd(webpath,"/")
      webpathl = Len(webpath)
      lh.s = LCase(nh)
      detect=FindString(lh,"<body",1)
      ;R(Mid(lh,detect,100))
      Repeat
          ; <A id="example" href="page2.html">text</A>
          detect = FindString(nh," href=",detect)
          If Not detect : Break : EndIf
         
          IsPointInsideAScriptBlock(lh,detect,inside,blockend)
          If inside
              ;R("INSIDE SCRIPT. BLOCK ENDS @ "+Str(blockend))
              detect = blockend
              Continue
          EndIf
         
         
          offset=7
          snippet = Mid(nh,detect+offset,webpathl+1)
          ;R(snippet)
          If Left(snippet,1)=c34
              offset=8
              snippet = Mid(snippet,detect+offset,webpathl+1)
          EndIf
          If StartsWith(snippet,"javascript:") Or StartsWith(snippet,"http://") Or StartsWith(snippet,"#") Or StartsWith(snippet,webpath)
              detect+webpathl
              Continue
          EndIf
          ;If Not snippet
          ;    detect+webpathl
          ;    Continue
          ;EndIf
         
          ;R(Mid(nh,detect+place-2,40))
          nh = InsertString(nh,webpath,detect+offset)
         
          detect+webpathl
      ForEver
  EndIf

 
 
  ; incorporate single css block...
  If ncss
      ncss = "<STYLE type="+c34+"text/css"+c34+">"+c13+c13+ncss+c13+c13+"</STYLE>"
      headend = FindString(nh,"<HEAD>",1)
      nh = InsertString(nh,c13+ncss,headend+7)
  EndIf
 
 
 
  ; find css images and base64 them...
  bi.s = "background-image:"
  If FindString(lh,bi,1)
      detect = 0
      Repeat
          ; { width:50px; background-image:url('graphics/example.jpg'); display:block; }"
          lh.s = LCase(nh)
          detect = FindString(lh,bi,detect)
          If Not detect : Break : EndIf
          detect+Len(bi)
          enddetect = FindString(nh,";",detect)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(nh,detect,enddetect-detect)
          entiresnippet.s = snippet
          EnsureThisNotStart(snippet,"url(")
          EnsureThisNotEnd(snippet,")")
          snippet = Trim(snippet) : snippet=Trim(snippet,c39) : snippet=Trim(snippet,c34)
          If Left(snippet,5)="data:" : detect=enddetect : Continue : EndIf
          If snippet
              If FindString(snippet,c34,1) : snippet=StringField(snippet,1,c34) : EndIf
              If FindString(snippet,c39,1) : snippet=StringField(snippet,1,c39) : EndIf
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              ;R(resourcefilename)
              If FindString(resourcefilename,"?",1) : resourcefilename = StringField(resourcefilename,1,"?") : EndIf
              ;If FileSize(resourcefilename)<0
              ;    R("NO FILE."+c13+c13+snippet1+c13+snippet2+c13+snippet+c13+resourcefilename)
              ;EndIf
              NewBase(resourcefilename)
              ntag.s = ReplaceString(entiresnippet,snippet,B64Signal(basecount))
              nh = ReplaceString(nh,entiresnippet,ntag)
          EndIf
          detect = enddetect
      ForEver
  EndIf
 
 
 
  If extrafile_arr
      ;R("EXTRA FILE ARR"+c13+c13+ReplaceString(extrafile_arr,pp,c13))
      ; we found some discernible filenames in the js, and/or some supplied to the procedure
      ; have to encode them and construct the js switching procedure, SwitchBase64
      switcherfunc.s = "function SwitchBase64(filename) {"+c13
      ;switcherfunc + "alert(filename);"
      switcherfunc + "  switch(filename) {"+c13
      For a = 1 To CountString(extrafile_arr,pp)
          resfile.s = StringField(extrafile_arr,a,pp)
          EnsureThisNotStart(resfile,path)
          If resfile = "" : Continue : EndIf   ; Modified by KCC
          NewBase(path+resfile)
          ;R(b64)
          switcherfunc + "    case "+c34+ReplaceString(resfile,"\","/")+c34+":"+c13
          switcherfunc + "      return "+c34+B64Signal(basecount)+c34+";"+c13
      Next a
      switcherfunc+c13+"    }"+c13+" }"
     
      ; now insert it right before the closing HEAD tag
      switcherfunc.s = "<SCRIPT type="+c34+"text/javascript"+c34+">"+c13+switcherfunc+c13+"</SCRIPT>"+c13
      ;nh = ReplaceString(nh,"</HEAD>","</HEAD>"+switcherfunc,#PB_String_NoCase)
      headend = FindString(nh,"</HEAD>",1)
      nh = InsertString(nh,switcherfunc,headend-1)
  EndIf
 
 
 
  detect=0
  For b = 1 To basecount
      marker.s = B64Signal(b)
      ;nh = ReplaceString(nh,marker,base64s(b)) : Continue
      ndetect = FindString(nh,marker,0)
      If ndetect
          detect=ndetect
          nh = ReplaceString(nh,marker,base64s(b),0,detect-2)
      EndIf
      ;detect = FindString(nh,marker,0)
      ;If detect
      ;    p1.s = Left(nh,detect-1)
      ;    p2.s = Mid(nh,detect+Len(marker),Len(nh))
      ;    nh = p1+base64s(b)+p2
      ;EndIf
  Next b
 
 
 
  FileFromString(destfile,nh)
 
EndProcedure

; Adding by KCC
; *************

InitNetwork()
ReceiveHTTPFile("https://www.amazon.fr/s?k=purebasic&__mk_fr_FR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss", "AmazonSearchPB.html")
InternaliseHTMLFile("AmazonSearchPB.html","AmazonSearchPB_Internalise.html")
But apparently the B64 not really execute and the <SCRIPT> converting not really works
For example in this part at random

Code: Select all

</script>

<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/01mI9NDJJTL._RC|41H+PEIOYHL.css,51CKwX+QTdL.css_.css?AUIClients/SearchAssets&nZMUiA5H#270238-T1.308335-T1" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/01BLqKISyaL._RC|01+neHskhqL.css,01mfj61BPYL.css,01Q4qmee9LL.css,11kdhabA0xL.css,01Y5FkF5TkL.css,0171-O+nBwL.css,2170Ev7c3lL.css,21rhNT4WPrL.css,01rZTK48+KL.css,21SHQ2IVatL.css,21nAJBNu4CL.css,01rdVnPkgmL.css,01ixfc-7StL.css,21URXAbTuFL.css,01Op2rWArIL.css,01hQwBUjIwL.css_.css?AUIClients/SearchPartnerAssets" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/31-wGuUNxVL.css?AUIClients/DetailPageAllOffersDisplayAssets&GSArOGBt#323159-T2.323160-T2" />

<script>
give a empty tag :cry:

Code: Select all

</script>

<script>
ImageThe happiness is a road...
Not a destination
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: Save full webpage in one HTML file

Post by Seymour Clufley »

Yes, you'll need to upgrade the Base64 code to work with PB's new method.

The LINK tags might be disappearing because the CSS from them is being absorbed into the main file. That's what should be happening, after all. The idea is that the output HTML file will not rely on any external resources.
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5357
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Save full webpage in one HTML file

Post by Kwai chang caine »

Yes, you'll need to upgrade the Base64 code to work with PB's new method.
Yes you talk about this part of code or there are another to modify ?

Code: Select all

Procedure.s File2Base64(filename.s)
 
  f = OpenFile(#PB_Any,filename)
  If Not f : ProcedureReturn "" : EndIf
  loaf = Lof(f)
  If loaf<1 : CloseFile(f) : ProcedureReturn "" : EndIf
  *mem = AllocateMemory(loaf)
  lengthread = ReadData(f,*mem,loaf)
  CloseFile(f)
  If Not lengthread
      FreeMemory(*mem)
      ProcedureReturn ""
  EndIf
 
  ; Modified by KCC
  ; ***************
  ;bloaf = loaf*1.5
  ;*b64 = AllocateMemory(bloaf)
  ;Base64Encoder(*mem,loaf,*b64,bloaf)
  ;b64.s = PeekS(*b64)
  ;FreeMemory(*b64)
  
  ; ***************
  
  b64.s = Base64Encoder(*mem,loaf)
  FreeMemory(*mem)
  
  ProcedureReturn b64
 
EndProcedure
The LINK tags might be disappearing because the CSS from them is being absorbed into the main file. That's what should be happening, after all
Yes too, but the problem is that apparently the links disappears, but he is not replacing by the content of the CSS, see yourself with a text comparer :|

Image
The idea is that the output HTML file will not rely on any external resources.
Yes, it's exactely what i search to do, and i'm happy you do this code before me 8)
But it's strange nobody or nearly are interesting by save a full page in only one file :shock:
Crosoft had this idea with MHT there are a long time ago, but like often, what microsoft do, it's for microsoft or nearly :|
ImageThe happiness is a road...
Not a destination
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: Save full webpage in one HTML file

Post by Seymour Clufley »

You should debug the code and find out whether it is successfully downloading these files:
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
Post Reply