It is currently Sun Jan 24, 2021 10:31 am

All times are UTC + 1 hour




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 7:45 pm 
Offline
Addict
Addict
User avatar

Joined: Sun Nov 05, 2006 11:42 pm
Posts: 4844
Location: Lyon - France
Hello at all,

I have installed the splendid extension "Save page WE" on my Firefox, and she works very fine
This extension create an HTML file with all the images, text, CSS of the webpage :shock: a little bit like the Microsoft MHT, but better, because it's a ".html" extension, reading by all the navigators 8)

Is it possible in PB ?
Have you see this style of PB code on the forum ?

Have a good day

_________________
ImageThe happiness is a road...
Not a destination


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 8:06 pm 
Offline
Enthusiast
Enthusiast

Joined: Wed May 27, 2020 12:26 pm
Posts: 245
I suppose technically you want to convert small images in Base64 text HTML ?


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 8:19 pm 
Offline
Addict
Addict
User avatar

Joined: Sun Nov 05, 2006 11:42 pm
Posts: 4844
Location: Lyon - France
Hello OLLI :wink:

Yes but not only, i think it's necessary to download all the CSS links and include the source downloaded in place of the link, and like you say, also replace the images links by B64 text :idea:

_________________
ImageThe happiness is a road...
Not a destination


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 9:24 pm 
Offline
Enthusiast
Enthusiast

Joined: Wed May 27, 2020 12:26 pm
Posts: 245
You should create a recursive system which detects all the external links. You talk about CSS : it means nothing for me.
I just know :
Code:
[a href=http://www.femmepoilue.com/chuistropjeune.htm]Dad allowed you?[/a]


Maybe you should start to detect these "a" markups and manage them as nodes.

There is already a problem with this, because CSS can replace such a node. Very complex.

Do you know to embed a Base64 encoded picture in a HTML file? ( if no )


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 9:39 pm 
Offline
Addict
Addict
User avatar

Joined: Sun Nov 05, 2006 11:42 pm
Posts: 4844
Location: Lyon - France
Thanks a lot for your link, it help me very much 8)
Thanks to you, i have a more (clean) code :mrgreen:

_________________
ImageThe happiness is a road...
Not a destination


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 9:46 pm 
Offline
Enthusiast
Enthusiast

Joined: Wed May 27, 2020 12:26 pm
Posts: 245
Could you extract this text manually and embed it in a forum code markup here ?

(Gohogleux sirche)
https://123doc.net/document/2365574-khoi-nghia-lam-son.htm


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 10:02 pm 
Offline
Addict
Addict
User avatar

Joined: Sun Nov 05, 2006 11:42 pm
Posts: 4844
Location: Lyon - France
Quote:
I believe it's time for you to "1Supo&OLi" :lol:

_________________
ImageThe happiness is a road...
Not a destination


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 10:14 pm 
Offline
Enthusiast
Enthusiast

Joined: Wed May 27, 2020 12:26 pm
Posts: 245
Thank to Wikipedia
Code:
<img src ="
ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU
5ErkJggg==" alt= "Red dot" />


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sat Jan 09, 2021 11:31 pm 
Offline
Addict
Addict

Joined: Thu Apr 18, 2019 8:17 am
Posts: 1208
Kwai chang caine wrote:
Have you see this style of PB code on the forum ?

Yes, back when you first discussed it in 2007 -> viewtopic.php?p=215998#p215998

Did you forget? Hehe. Here's some other links that may help:

viewtopic.php?f=7&t=35772
viewtopic.php?f=13&t=28267
viewtopic.php?f=7&t=41271


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Sun Jan 10, 2021 11:04 am 
Offline
Addict
Addict
User avatar

Joined: Sun Nov 05, 2006 11:42 pm
Posts: 4844
Location: Lyon - France
@OLLI
Thanks for the help, that can be usefull for include images 8)
But i think the more hard is to include the CSS, and JavaScript links :|

@BarryG
Hello BarryG :D

First thanks for your answer 8)

Quote:
Yes, back when you first discussed it in 2007
Did you forget?
Yes i have forgotten (the "power" of old man :oops: ), but yesterday, before ask question, i have search in the forum and find it another time
But justly, i not search a code for create MHT, but just HTML :wink:
Because, MTH is Microsoft license and all the navigators can not read this proprietary extension without extension :|

So thanks but the 3 first links is not really powerfull for me :wink:
On the other hand, the 4e yes, i not found it yesterday.
At the first regard, it's an begin, i read it more :D

Again thanks a lot 8)

_________________
ImageThe happiness is a road...
Not a destination


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Mon Jan 11, 2021 4:17 pm 
Offline
Addict
Addict

Joined: Wed Feb 28, 2007 9:13 am
Posts: 1084
Location: London
I did this.

_________________
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Tue Jan 12, 2021 5:54 pm 
Offline
Addict
Addict
User avatar

Joined: Sun Nov 05, 2006 11:42 pm
Posts: 4844
Location: Lyon - France
Hello Seymour Clufley :D

Happy to talk with the creator of this nice code, for thank you directly to share it 8)

I have see it, because BarryG have the kindness to give the link of it in the 4e link :wink:

I have try to convert it in 5.73 (Add "Modified by KCC")

Code:
Macro R(t)
  MessageRequester("Report",t,0)
EndMacro

Global c10.s = Chr(10)
Global c13.s = Chr(13)
Global c32.s = Chr(32)
Global c34.s = Chr(34)
Global c39.s = Chr(39)

Macro EnsureThisNotStart(t,start)
 
  If Left(t,Len(start)) = start
      t = Mid(t,Len(start)+1,Len(t))
  EndIf
 
EndMacro

Procedure.s EnsureNotStart(t.s,start.s)
 
  EnsureThisNotStart(t,start)
  ProcedureReturn t
 
EndProcedure

Macro EnsureThisEnd(t,endd)
 
  If endd<>""
      If Right(t,Len(endd)) <> endd
          t+endd
      EndIf
  EndIf
 
EndMacro

Macro EnsureThisNotEnd(t,endd)
 
  If Right(t,Len(endd)) = endd
      ;snipped.s = Len(t)-Len(endd)
      ;t = Left(t,snipped)
      t = Left(t,Len(t)-Len(endd))
  EndIf
 
EndMacro

Macro StartsWith(main,sub)
  (sub<>"" And main<>"" And Left(main,Len(sub))=sub)
EndMacro


Procedure.s FileToString(filename.s)
 
  info.s = ""
  file = ReadFile(#PB_Any,filename)
  If file
      While Not Eof(file)
          info + ReadString(file)+c13
      Wend
      CloseFile(file)
  EndIf
 
  ProcedureReturn info
 
EndProcedure

Procedure.b FileFromString(filename.s,string.s)
  ;Report("FILE: "+filename+c13+"STRING: "+string)
 
  If filename = "" : ProcedureReturn #False : EndIf   ; Modified by KCC
  ;If Not EnsureFolderPath(GetPathPart(filename)) : ProcedureReturn #False : EndIf
 
  string = RemoveString(string,c10)
  If FindString(string,c13,0)
      EnsureThisNotEnd(string,c13) ; this removes final linebreak
      string = ReplaceString(string,c13,c13+c10)
  EndIf
 
  file = CreateFile(#PB_Any,filename)
  If IsFile(file)
      WriteString(file,string)
      CloseFile(file)
      ProcedureReturn #True
  EndIf
 
  ProcedureReturn #False
 
EndProcedure


Procedure.s FileMimeType(filename.s)
 
  mime.s
  Select LCase(GetExtensionPart(filename))
      Case "png"
          mime = "image/png"
      Case "jpg", "jpeg", "jpe"
          mime = "image/jpeg"
      Case "ico"
          mime = "image/x-icon"
      Case "gif"
          mime = "image/gif"
      Case "bmp"
          mime = "image/bmp"
      Case "tif", "tiff"
          mime = "image/tiff"
  EndSelect
 
  ProcedureReturn mime
 
EndProcedure

Procedure.s File2Base64(filename.s)
 
  f = OpenFile(#PB_Any,filename)
  If Not f : ProcedureReturn "" : EndIf
  loaf = Lof(f)
  If loaf<1 : CloseFile(f) : ProcedureReturn "" : EndIf
  *mem = AllocateMemory(loaf)
  lengthread = ReadData(f,*mem,loaf)
  CloseFile(f)
  If Not lengthread
      FreeMemory(*mem)
      ProcedureReturn ""
  EndIf
 
  ; Modified by KCC
  ; ***************
  ;bloaf = loaf*1.5
  ;*b64 = AllocateMemory(bloaf)
  ;Base64Encoder(*mem,loaf,*b64,bloaf)
  ;b64.s = PeekS(*b64)
  ;FreeMemory(*b64)
 
  ; ***************
 
  b64.s = Base64Encoder(*mem,loaf)
  FreeMemory(*mem)
 
  ProcedureReturn b64
 
EndProcedure


Procedure.s OmitUnusedClasses(css.s,html.s)
 
  css+c13
  html = LCase(html)
  ncss.s = ""
  html = ReplaceString(html," class='"," class="+c34)
  html = ReplaceString(html," id='"," id="+c34)
 
  For a = 1 To CountString(css,c13)
      ln.s = StringField(css,a,c13)
      ln = Trim(ln)
      If ln = "": Continue : EndIf   ; Modified by KCC
      ln = RemoveString(ln,Chr(9))
      use.b = #False
      lln.s = LCase(ln)
      lln = ReplaceString(lln,"{",c32)
      lln = ReplaceString(lln,".",c32)
      lln = Trim(lln,c32)
      name.s = StringField(lln,1,c32)
      ;R(name)
      If Left(lln,1)="#"
          ; it's styling for an id
          inlinename.s = EnsureNotStart(name,"#")
          use = #False
          If FindString(html," id="+inlinename,1)
              use=#True
          Else
              If FindString(html," id="+c34+inlinename+c34,1)
                  use=#True
              Else
                  If FindString(html," id="+c39+inlinename+c39,1)
                      use=#True
                  EndIf
              EndIf
          EndIf
      Else
          ; it's a class
          If FindString(html," class="+name,1)
              use=#True
          Else
              If FindString(html," class="+c34+name+c34,1)
                  use=#True
              EndIf
          EndIf
      EndIf
     
      If use
          ncss+ln+c13
      EndIf
  Next a
 
  ProcedureReturn ncss
 
EndProcedure

Macro IsPointInsideAScriptBlock(lh,point,var,blockend)
  var = #False
  nextjsopen = FindString(lh,"<script ",detect)
  nextjsclose = FindString(lh,"</script>",detect)
  If nextjsclose>0
      If nextjsclose<nextjsopen Or Not nextjsopen
          ;Debug "INSIDE A SCRIPT BLOCK"
          var = #True
          blockend=nextjsclose
      EndIf
  EndIf
EndMacro




Macro NewBase(filename)
  basecount+1
  barrsize = ArraySize(base64s(),1)
  If barrsize<basecount
      barrsize+5
      ReDim base64s(barrsize)
  EndIf
  base64s(basecount) = "data:"+FileMimeType(filename)+";base64,"+File2Base64(filename)
EndMacro

Macro showtime(t)
  Debug RSet(Str(ElapsedMilliseconds()-st),5)+"  "+t
EndMacro

Macro B64Signal(b)
  "INSERTBASE64_"+Str(b)+"_"
EndMacro

Procedure.b InternaliseHTMLFile(sourcefile.s,destfile.s,webpath.s="",path.s="",omitunusedclasses.b=#True,lookforclassesinjavascript.b=#True,lookforimagesinjavascript.b=#True,extrafile_arr.s="")
 
  st=ElapsedMilliseconds()
 
  ; external stylesheets
  ; external javascripts
  ; images (converted to base64)
  ; background-images (converted to base64)
  ; svgs (converted to base64)
 
  pp.s = "|"
  Dim base64s.s(20)
 
  If path
      EnsureThisEnd(path,"\")
  Else
      path = GetPathPart(sourcefile)
  EndIf
 
  h.s = FileToString(sourcefile)
 
  lh.s = LCase(h)
  nh.s = h
 
  badtagarr.s
 
  importedcss.s
 
 
 
  ; find all external scripts and get their content...
  detect=0
  Repeat
      detect = FindString(lh,"<script ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      srcstart = FindString(snippet," src=",1)
      If srcstart
          snippet = Mid(snippet,srcstart+5,Len(snippet))
          ;snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          snippet = Trim(snippet,c34)
          ;EnsureThisNotStart(snippet,c34)
          ;EnsureThisNotEnd(snippet,c34)
          If FindString(snippet,c34,1)
              snippet = StringField(snippet,1,c34)
          EndIf
;          R(snippet)
          If snippet
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              newjs.s = FileToString(resourcefilename)
              If newjs = "": detect=enddetect : Continue : EndIf   ; Modified by KCC
              newjs = Trim(newjs,c13)
              tag.s = Mid(h,detect,enddetect-detect+1)
              If newjs
                  ntag.s = RemoveString(tag,snippet,#PB_String_NoCase)
                  ntag = RemoveString(ntag,c32+"src="+c34+c34)
                  ntag = RemoveString(ntag,c32+"src=")
                  ntag+c13+newjs+c13
                  nh = ReplaceString(nh,tag,ntag)
              Else
                  ; empty js file
                  nh = RemoveString(nh,tag+"</SCRIPT>",#PB_String_NoCase)
                  nh = RemoveString(nh,tag+" </SCRIPT>",#PB_String_NoCase)
                  nh = RemoveString(nh,tag+c13+"</SCRIPT>",#PB_String_NoCase)
              EndIf
          EndIf
      EndIf
      detect=enddetect
  ForEver
 
 
 
  ; find all external stylesheets and get their content...
  detect=0
  Repeat
      ; <LINK rel="stylesheet" type="text/css" href="styles.css" />
      detect = FindString(lh,"<link ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      If FindString(snippet,"stylesheet",1)
      srcstart = FindString(snippet," href=",1)
      If srcstart
          snippet = Mid(snippet,srcstart+5,Len(snippet))
          snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          EnsureThisNotStart(snippet,c34)
          EnsureThisNotEnd(snippet,c34)
          If snippet
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              newcss.s = FileToString(resourcefilename)
              newcss = Trim(newcss,c13)
              If newcss
                  importedcss+c13+c13+newcss
              EndIf
              badtagarr+ Mid(h,detect,enddetect-detect+1) +pp
          EndIf
      EndIf
      EndIf
      detect=enddetect
  ForEver
  If importedcss
      importedcss = Trim(importedcss,c13)
  EndIf
 
 
 
  ; find all external images and convert them to base64...
  detect=0
  Repeat
      ; <IMG src="tree.jpg" />
      detect = FindString(lh,"<img ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      srcstart = FindString(snippet," src=",1)
      If srcstart
          ;snippet1.s=snippet
          snippet = Mid(snippet,srcstart+4,Len(snippet))
          snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          ;EnsureThisNotStart(snippet,c34)
          ;EnsureThisNotEnd(snippet,c34)
          snippet=Trim(snippet,c39) : snippet=Trim(snippet,c34)
          ;snippet2.s = snippet
          If snippet
              If FindString(snippet,c34,1) : snippet=StringField(snippet,1,c34) : EndIf
              If FindString(snippet,c39,1) : snippet=StringField(snippet,1,c39) : EndIf
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              If FindString(resourcefilename,"?",1) : resourcefilename = StringField(resourcefilename,1,"?") : EndIf
              ;If FileSize(resourcefilename)<0
              ;    R("NO FILE."+c13+c13+snippet1+c13+snippet2+c13+snippet+c13+resourcefilename)
              ;EndIf
              NewBase(resourcefilename)
              tag.s = Mid(h,detect,enddetect-detect+1)
              ntag.s = ReplaceString(tag,snippet,B64Signal(basecount))
              nh = ReplaceString(nh,tag,ntag)
          EndIf
      EndIf
      detect=enddetect
  ForEver
 
 
 
  ; find all external SVGs and convert them to base64...
  If FindString(lh,".svg",1)
      detect=0
      Repeat
          ; <OBJECT data="arcrap.svg" type="image/svg+xml" width="100%" height="100%">    </OBJECT>
          detect = FindString(lh,"<object ",detect)
          If Not detect : Break : EndIf
          enddetect = FindString(lh,">",detect)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(lh,detect,enddetect-detect)
          srcstart = FindString(snippet," data=",1)
          If srcstart
              snippet = Mid(snippet,srcstart+5,Len(snippet))
              EnsureThisNotStart(snippet,"=")
              EnsureThisNotEnd(snippet,"/")
              EnsureThisNotStart(snippet,c34)
              If Left(snippet,1)=c34
                  EnsureThisNotStart(snippet,c34)
                  snippet = StringField(snippet,1,c34)
              Else
                  snippet = StringField(snippet,1,c32)
              EndIf
              snippet = Trim(snippet)
              EnsureThisNotEnd(snippet,c34)
              ;MessageRequester("SNIPPET","**"+snippet+"**")
              If snippet
                  resourcefilename.s = path+ReplaceString(snippet,"/","\")
                  ;MessageRequester("BASE 64",b64)
                  NewBase(resourcefilename)
                  tag.s = Mid(h,detect,enddetect-detect+1)
                  ntag.s = ReplaceString(tag,snippet,B64Signal(basecount))
                  nh = ReplaceString(nh,tag,ntag)
              EndIf
          EndIf
          detect=enddetect
      ForEver
  EndIf
 
 
  If badtagarr
      For a = 1 To CountString(badtagarr,pp)
          tag.s = StringField(badtagarr,a,pp)
          nh = RemoveString(nh,tag)
      Next a
  EndIf
  ;showtime("after badtagarr section")
 
 
  swb64.s = "SwitchBase64"
  If lookforimagesinjavascript
      attrname.s = "src"
      For cycle = 1 To 2
          elsrc.s = "."+attrname+"="
          nh = ReplaceString(nh,"."+attrname+" =",elsrc)
          nh = ReplaceString(nh,"."+attrname+"= ",elsrc)
          detect = 0
          enddetect = 0
          ;R("ELSRC: "+elsrc)
          Repeat
              lh.s = LCase(nh)
              detect = FindString(lh,elsrc,detect)
              If Not detect : Break : EndIf
              detect+Len(elsrc)
              enddetect = FindString(lh,";",detect)
              snippet.s = Mid(nh,detect,enddetect-detect)
              If StartsWith(snippet,swb64)
                  detect = enddetect
                  Continue
              EndIf
              ;R(snippet)
              newinstrux.s = elsrc+swb64+"("+snippet+")"
              ;R(elsrc+snippet+c13+c13+newinstrux)
              nh = ReplaceString(nh,elsrc+snippet,newinstrux,#PB_String_NoCase)
              nh = ReplaceString(nh,elsrc+" "+snippet,newinstrux,#PB_String_NoCase)
              detect = enddetect
             
              snippet = Trim(snippet,c34)
              snippet = Trim(snippet,c39)
              ;R(path+snippet)
              If FileSize(path+snippet)>0 ; it's a filename (otherwise it's a variable or a js function call)
                  ;R("FILE FOUND IN BASE FOLDER")
                  If Not FindString(pp+extrafile_arr,pp+snippet+pp,1) ; add it to the list if it's not already there
                      extrafile_arr+snippet+pp
                  EndIf
              EndIf
             
          ForEver
          attrname = "backgroundImage"
      Next cycle
  EndIf
 
 
 
  ncss.s = importedcss
  ; now get any internal style blocks...
  lh.s = LCase(nh)
  If FindString(lh,"<style",1)
      nh = ReplaceString(nh,"<style","<STYLE")
      nh = ReplaceString(nh,"</style>","</STYLE>")
      Repeat
          detect = FindString(nh,"<STYLE",0)
          If Not detect : Break : EndIf
          enddetect = FindString(nh,"</STYLE>",detect+6)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(nh,detect,enddetect-detect+8)
          nh = RemoveString(nh,snippet)
          EnsureThisNotEnd(snippet,"</STYLE>")
          detect = FindString(snippet,">",1)
          snippet = Mid(snippet,detect+1,Len(snippet))
          snippet = RemoveString(snippet,Chr(9))
         
          ncss+Trim(snippet,c13)+c13
      ForEver
  EndIf
  ;showtime("after getting style blocks")
 
  ; omit unused...
  If ncss
      If omitunusedclasses
          ncss = OmitUnusedClasses(ncss,nh)
      EndIf
  EndIf
 
 
 
  Debug(Str(Len(nh)))
  If webpath ; takes ages!
      EnsureThisEnd(webpath,"/")
      webpathl = Len(webpath)
      lh.s = LCase(nh)
      detect=FindString(lh,"<body",1)
      ;R(Mid(lh,detect,100))
      Repeat
          ; <A id="example" href="page2.html">text</A>
          detect = FindString(nh," href=",detect)
          If Not detect : Break : EndIf
         
          IsPointInsideAScriptBlock(lh,detect,inside,blockend)
          If inside
              ;R("INSIDE SCRIPT. BLOCK ENDS @ "+Str(blockend))
              detect = blockend
              Continue
          EndIf
         
         
          offset=7
          snippet = Mid(nh,detect+offset,webpathl+1)
          ;R(snippet)
          If Left(snippet,1)=c34
              offset=8
              snippet = Mid(snippet,detect+offset,webpathl+1)
          EndIf
          If StartsWith(snippet,"javascript:") Or StartsWith(snippet,"http://") Or StartsWith(snippet,"#") Or StartsWith(snippet,webpath)
              detect+webpathl
              Continue
          EndIf
          ;If Not snippet
          ;    detect+webpathl
          ;    Continue
          ;EndIf
         
          ;R(Mid(nh,detect+place-2,40))
          nh = InsertString(nh,webpath,detect+offset)
         
          detect+webpathl
      ForEver
  EndIf

 
 
  ; incorporate single css block...
  If ncss
      ncss = "<STYLE type="+c34+"text/css"+c34+">"+c13+c13+ncss+c13+c13+"</STYLE>"
      headend = FindString(nh,"<HEAD>",1)
      nh = InsertString(nh,c13+ncss,headend+7)
  EndIf
 
 
 
  ; find css images and base64 them...
  bi.s = "background-image:"
  If FindString(lh,bi,1)
      detect = 0
      Repeat
          ; { width:50px; background-image:url('graphics/example.jpg'); display:block; }"
          lh.s = LCase(nh)
          detect = FindString(lh,bi,detect)
          If Not detect : Break : EndIf
          detect+Len(bi)
          enddetect = FindString(nh,";",detect)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(nh,detect,enddetect-detect)
          entiresnippet.s = snippet
          EnsureThisNotStart(snippet,"url(")
          EnsureThisNotEnd(snippet,")")
          snippet = Trim(snippet) : snippet=Trim(snippet,c39) : snippet=Trim(snippet,c34)
          If Left(snippet,5)="data:" : detect=enddetect : Continue : EndIf
          If snippet
              If FindString(snippet,c34,1) : snippet=StringField(snippet,1,c34) : EndIf
              If FindString(snippet,c39,1) : snippet=StringField(snippet,1,c39) : EndIf
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              ;R(resourcefilename)
              If FindString(resourcefilename,"?",1) : resourcefilename = StringField(resourcefilename,1,"?") : EndIf
              ;If FileSize(resourcefilename)<0
              ;    R("NO FILE."+c13+c13+snippet1+c13+snippet2+c13+snippet+c13+resourcefilename)
              ;EndIf
              NewBase(resourcefilename)
              ntag.s = ReplaceString(entiresnippet,snippet,B64Signal(basecount))
              nh = ReplaceString(nh,entiresnippet,ntag)
          EndIf
          detect = enddetect
      ForEver
  EndIf
 
 
 
  If extrafile_arr
      ;R("EXTRA FILE ARR"+c13+c13+ReplaceString(extrafile_arr,pp,c13))
      ; we found some discernible filenames in the js, and/or some supplied to the procedure
      ; have to encode them and construct the js switching procedure, SwitchBase64
      switcherfunc.s = "function SwitchBase64(filename) {"+c13
      ;switcherfunc + "alert(filename);"
      switcherfunc + "  switch(filename) {"+c13
      For a = 1 To CountString(extrafile_arr,pp)
          resfile.s = StringField(extrafile_arr,a,pp)
          EnsureThisNotStart(resfile,path)
          If resfile = "" : Continue : EndIf   ; Modified by KCC
          NewBase(path+resfile)
          ;R(b64)
          switcherfunc + "    case "+c34+ReplaceString(resfile,"\","/")+c34+":"+c13
          switcherfunc + "      return "+c34+B64Signal(basecount)+c34+";"+c13
      Next a
      switcherfunc+c13+"    }"+c13+" }"
     
      ; now insert it right before the closing HEAD tag
      switcherfunc.s = "<SCRIPT type="+c34+"text/javascript"+c34+">"+c13+switcherfunc+c13+"</SCRIPT>"+c13
      ;nh = ReplaceString(nh,"</HEAD>","</HEAD>"+switcherfunc,#PB_String_NoCase)
      headend = FindString(nh,"</HEAD>",1)
      nh = InsertString(nh,switcherfunc,headend-1)
  EndIf
 
 
 
  detect=0
  For b = 1 To basecount
      marker.s = B64Signal(b)
      ;nh = ReplaceString(nh,marker,base64s(b)) : Continue
      ndetect = FindString(nh,marker,0)
      If ndetect
          detect=ndetect
          nh = ReplaceString(nh,marker,base64s(b),0,detect-2)
      EndIf
      ;detect = FindString(nh,marker,0)
      ;If detect
      ;    p1.s = Left(nh,detect-1)
      ;    p2.s = Mid(nh,detect+Len(marker),Len(nh))
      ;    nh = p1+base64s(b)+p2
      ;EndIf
  Next b
 
 
 
  FileFromString(destfile,nh)
 
EndProcedure

; Adding by KCC
; *************

InitNetwork()
ReceiveHTTPFile("https://www.amazon.fr/s?k=purebasic&__mk_fr_FR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss", "AmazonSearchPB.html")
InternaliseHTMLFile("AmazonSearchPB.html","AmazonSearchPB_Internalise.html")


But apparently the B64 not really execute and the <SCRIPT> converting not really works
For example in this part at random

Code:
</script>

<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/01mI9NDJJTL._RC|41H+PEIOYHL.css,51CKwX+QTdL.css_.css?AUIClients/SearchAssets&nZMUiA5H#270238-T1.308335-T1" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/01BLqKISyaL._RC|01+neHskhqL.css,01mfj61BPYL.css,01Q4qmee9LL.css,11kdhabA0xL.css,01Y5FkF5TkL.css,0171-O+nBwL.css,2170Ev7c3lL.css,21rhNT4WPrL.css,01rZTK48+KL.css,21SHQ2IVatL.css,21nAJBNu4CL.css,01rdVnPkgmL.css,01ixfc-7StL.css,21URXAbTuFL.css,01Op2rWArIL.css,01hQwBUjIwL.css_.css?AUIClients/SearchPartnerAssets" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/31-wGuUNxVL.css?AUIClients/DetailPageAllOffersDisplayAssets&GSArOGBt#323159-T2.323160-T2" />

<script>
give a empty tag :cry:
Code:
</script>

<script>

_________________
ImageThe happiness is a road...
Not a destination


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Wed Jan 13, 2021 11:35 am 
Offline
Addict
Addict

Joined: Wed Feb 28, 2007 9:13 am
Posts: 1084
Location: London
Yes, you'll need to upgrade the Base64 code to work with PB's new method.

The LINK tags might be disappearing because the CSS from them is being absorbed into the main file. That's what should be happening, after all. The idea is that the output HTML file will not rely on any external resources.

_________________
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Wed Jan 13, 2021 7:00 pm 
Offline
Addict
Addict
User avatar

Joined: Sun Nov 05, 2006 11:42 pm
Posts: 4844
Location: Lyon - France
Quote:
Yes, you'll need to upgrade the Base64 code to work with PB's new method.
Yes you talk about this part of code or there are another to modify ?
Code:
Procedure.s File2Base64(filename.s)
 
  f = OpenFile(#PB_Any,filename)
  If Not f : ProcedureReturn "" : EndIf
  loaf = Lof(f)
  If loaf<1 : CloseFile(f) : ProcedureReturn "" : EndIf
  *mem = AllocateMemory(loaf)
  lengthread = ReadData(f,*mem,loaf)
  CloseFile(f)
  If Not lengthread
      FreeMemory(*mem)
      ProcedureReturn ""
  EndIf
 
  ; Modified by KCC
  ; ***************
  ;bloaf = loaf*1.5
  ;*b64 = AllocateMemory(bloaf)
  ;Base64Encoder(*mem,loaf,*b64,bloaf)
  ;b64.s = PeekS(*b64)
  ;FreeMemory(*b64)
 
  ; ***************
 
  b64.s = Base64Encoder(*mem,loaf)
  FreeMemory(*mem)
 
  ProcedureReturn b64
 
EndProcedure


Quote:
The LINK tags might be disappearing because the CSS from them is being absorbed into the main file. That's what should be happening, after all
Yes too, but the problem is that apparently the links disappears, but he is not replacing by the content of the CSS, see yourself with a text comparer :|

Image

Quote:
The idea is that the output HTML file will not rely on any external resources.
Yes, it's exactely what i search to do, and i'm happy you do this code before me 8)
But it's strange nobody or nearly are interesting by save a full page in only one file :shock:
Crosoft had this idea with MHT there are a long time ago, but like often, what microsoft do, it's for microsoft or nearly :|

_________________
ImageThe happiness is a road...
Not a destination


Top
 Profile  
Reply with quote  
 Post subject: Re: Save full webpage in one HTML file
PostPosted: Wed Jan 13, 2021 10:29 pm 
Offline
Addict
Addict

Joined: Wed Feb 28, 2007 9:13 am
Posts: 1084
Location: London
You should debug the code and find out whether it is successfully downloading these files:
Quote:
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/01mI9NDJJTL._RC|41H+PEIOYHL.css,51CKwX+QTdL.css_.css?AUIClients/SearchAssets&nZMUiA5H#270238-T1.308335-T1" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/01BLqKISyaL._RC|01+neHskhqL.css,01mfj61BPYL.css,01Q4qmee9LL.css,11kdhabA0xL.css,01Y5FkF5TkL.css,0171-O+nBwL.css,2170Ev7c3lL.css,21rhNT4WPrL.css,01rZTK48+KL.css,21SHQ2IVatL.css,21nAJBNu4CL.css,01rdVnPkgmL.css,01ixfc-7StL.css,21URXAbTuFL.css,01Op2rWArIL.css,01hQwBUjIwL.css_.css?AUIClients/SearchPartnerAssets" />
<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/31-wGuUNxVL.css?AUIClients/DetailPageAllOffersDisplayAssets&GSArOGBt#323159-T2.323160-T2" />

_________________
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 30 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye