"Internalise" an HTML file (external resources) [pleez help]

Everything else that doesn't fall into one of the other PB categories.
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

"Internalise" an HTML file (external resources) [pleez help]

Post by Seymour Clufley »

Here's some code someone might find a use for. I bashed it together in a half hour so it could probably be optimised, but it's good enough for my needs.

It takes an HTML file, examines it for linked stylesheets, JS files and images, and then rewrites the file with those things included.

Image inclusion is done using Base64 encoding. This will fail to display in IE if the data exceeds 32kb, because IE has that as an arbitrary limit (another great feature of IE), but all other browsers seem to display such images without problems.

When including an external stylesheet, this code ignore class definitions that are not actually used in the HTML if omitunusedclasses is set to #True.

NOTE: This code doesn't work on online webpages. It can only handle HTML files and resources stored on your computer.

Hope someone can use it. :)

Code: Select all

find updated code below
Last edited by Seymour Clufley on Mon Jan 11, 2021 4:18 pm, edited 2 times in total.
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
User avatar
idle
Always Here
Always Here
Posts: 5097
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by idle »

nice idea
rsts
Addict
Addict
Posts: 2736
Joined: Wed Aug 24, 2005 8:39 am
Location: Southwest OH - USA

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by rsts »

I include some HTML now via a series of string+ statements :oops:

This may come in very handy. Thanks.

cheers
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by Seymour Clufley »

idle wrote:nice idea
Thanks. It's just occurred to me that this way of "internalising" an HTML file is an alternative to MHT, with the advantage that any browser can read it! That's pretty cool. :)
rsts wrote:I include some HTML now via a series of string+ statements :oops:

This may come in very handy.
Yes, I'm also assembling webpages using the string commands. I wrote this code because my website has a page which I'm including in download packages (zips), so this code converts the "live" version of the page into a self-contained, portable version.

Apparently it's also possible to include SWFs and SVGs using Base64 encoding. Perhaps with HTML5's video object the same will become true of video (and audio?) files, but the Base64 strings would be huge! Anyway, when I have time I'll test Base64-encoded SWFs and SVGs and see what happens. If it works I'll update the code.

In the meantime, can anyone think of other external resources that could be incorporated? It'd be good to make this code as "all-encompassing" as possible.
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
User avatar
idle
Always Here
Always Here
Posts: 5097
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by idle »

I'm not sure about using the base64 encoding for large resources, there's possibly another alternative like pipes maybe, will have a look into it a bit later.

thinking beyond the file creation and using it in a web gui you will probably want a get method in the case you want to get resources from the web, I figure if you make a web gui then the chances are that you want to fetch stuff from a server at some stage to, like for logging in to a service, viewing your account ... enabling you to get the data from the web while the formatting resides in the app. I've posted code for that.
I've also got a frame work for a web gui around somewhere using java script if you want a look.
User avatar
utopiomania
Addict
Addict
Posts: 1655
Joined: Tue May 10, 2005 10:00 pm
Location: Norway

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by utopiomania »

Since it can't handle online web-pages, what is the point in rewriting local files into one file this way??

Why not only put the files in a folder with an index.hm??
UserOfPure
Enthusiast
Enthusiast
Posts: 469
Joined: Sun Mar 16, 2008 9:18 am

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by UserOfPure »

utopiomania wrote:Since it can't handle online web-pages, what is the point in rewriting local files into one file this way??
Because it creates a single file with all local files embedded into it, so everything is one rather than many. Some people like that.

But it should also be easy to download the external files, encode them to Base64, and embed them into the single file too.
Coolman
Enthusiast
Enthusiast
Posts: 103
Joined: Sat Sep 03, 2005 4:07 pm

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by Coolman »

Good job and good idea, but the images are not always integrated, I made a change, and it seems to function very well, I have amended that section for images, there is a lot of error was corrected in the other sections (scripts...)

Here is the amended section:

Code: Select all

; find all external images and convert them to base64...
  detect=0  
 Repeat
      ; <IMG src="tree.jpg" />
      detect = FindString(lh,"<img ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      srcstart = FindString(snippet," src=",1)
      If srcstart
          snippet = Mid(snippet,srcstart+4,Len(snippet))
          snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          EnsureThisNotStart(snippet,c34)
          EnsureThisNotEnd(snippet,c34)
          If snippet
              resourcefilename.s = path+ReplaceString(snippet,"/","\") 
              If FindString(resourcefilename,Chr(34),1) : resourcefilename=Left(resourcefilename,FindString(resourcefilename,Chr(34),1)-1) : EndIf 
              mime.s = FileMimeType(resourcefilename)
              b64.s = ImageFile2Base64(resourcefilename)
              tag.s = Mid(h,detect,enddetect-detect+1)
              deb=FindString(tag,"src="+Chr(34),1) : fin=FindString(tag,Chr(34),deb+6) : tag=Mid(tag,deb+5,fin-deb-5) 
              nh = ReplaceString(nh,tag,"Data:"+mime+";base64,"+b64)       
          EndIf
      EndIf
      detect=enddetect
  ForEver
*** Excuse my bad English, I uses has translating program ***
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by Seymour Clufley »

I've tested base64-encoded SVG files and they work fine.

SWF seems to be different: data URIs are handled by the browser, whereas SWFs are handled by the Flash player, which doesn't know about data URI, so it doesn't work. It can only load external SWF files. Or at least, that's what I understand.

So the only thing we can add to this code is for including SVGs. Maybe VML too (if anyone's interested in doing that!). If the Flash player is updated to handle data URIs some time, that could be added in the future.
Coolman wrote:there is a lot of error was corrected in the other sections (scripts...)
Do you mean there are errors in the section for getting JavaScript files? Can you elaborate?
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
Coolman
Enthusiast
Enthusiast
Posts: 103
Joined: Sat Sep 03, 2005 4:07 pm

Re: "Internalise" an HTML file (external resources etc.) [code]

Post by Coolman »

to simplify ...

there's a problem in the filename, for example, save this page (test.htm):

viewtopic.php?f=14&t=40602

perform the test, the first image file name is incorrect:

---- \test_fichiers\icon_user_offline.gif" alt"offline" title"offline

instead

---- \test_fichiers\icon_user_offline.gif

for this reason that I added:

If FindString(resourcefilename,Chr(34),1) : resourcefilename=Left(resourcefilename,FindString(resourcefilename,Chr(34),1)-1) : EndIf

The problem is egallement valid in other sections:

---- \test_fichiers\style.css" type"text\css

instead

---- \test_fichiers\style.css

In this case, the file is simply not integrated since the name is incorrect...

add 'Debug resourcefilename' :

resourcefilename.s = path+ReplaceString(snippet,"/","\") : Debug resourcefilename

and run in debug mode to see the problem ...

8)
*** Excuse my bad English, I uses has translating program ***
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: "Internalise" an HTML file (external resources etc.) [co

Post by Seymour Clufley »

Here's a new version of the code.

Improvements include:
  • optional parameter for manually setting base path
  • now handles SVG images
  • now internalises background images called by CSS and inline styling
  • now treats all CSS together (so all unused classes can be removed)
  • better parsing of file paths

Code: Select all

find updated code below
There are two issues - not bugs - that I'd like to resolve. If an image is called more than once in the page, it is encoded for each instance, adding to the processing time and to the resulting file size. I wonder if there is a way to include a block of URI data once, and have it called by multiple tags? SVG has the USE element which I think would solve this problem there, but I don't know about HTML.

The second issue is that images called by JavaScript are not included. This means that if an image's src is changed by JS, the replacement image will not be available. To an extent this could be solved by simply scanning the JS for file paths (I'll probably implement this), but if an image's path is created by joining multiple variables then we've got a real problem.

I've come across this issue because, ironically, the pages I now want to "internalise" have exactly this characteristic! OnMouseOver actions on IMG elements trigger JS which changes the IMG src.

I've thought of two possible solutions:
  • parse the JS looking for variables that are eventually used to form the file path for a src attribute. This would be extremely difficult because it would involve programming PB to understand JavaScript.
  • add a custom JS function which switches file paths for Base64 data, and have all el.src="*" calls refer to that function. A list of additional image files to encode could be passed to the InternaliseHTMLFile() procedure and this list would be used to form the JS switching function. Again, messy, but it may work. Perhaps the trickiest bit would be finding and changing el.src="*" calls in the JS. FWIW, it is possible to change an img's src to Base64 data inside JS.
Can anyone think of any other solutions?
Last edited by Seymour Clufley on Mon Jan 11, 2021 4:18 pm, edited 1 time in total.
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: "Internalise" an HTML file (external resources) [pleez help]

Post by Seymour Clufley »

I've managed it!

If anyone wants the code, let me know and I'll post it when I can.
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
Coolman
Enthusiast
Enthusiast
Posts: 103
Joined: Sat Sep 03, 2005 4:07 pm

Re: "Internalise" an HTML file (external resources) [pleez help]

Post by Coolman »

advice :

detect = FindString(lh,"<img ",detect)
srcstart = FindString(snippet," src=",1)

Some images are not detected and thus not included ...

Correction :

detect = FindString(lh,"<img",detect)
srcstart = FindString(snippet,"src=",1)

I have not watched the other sections, not really have time right now...

8)
*** Excuse my bad English, I uses has translating program ***
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: "Internalise" an HTML file (external resources) [pleez h

Post by Seymour Clufley »

Thanks for your advice, Coolman.

Here is a new version of the code which partly deals with the problem I described.

Code: Select all

Macro R(t)
  MessageRequester("Report",t,0)
EndMacro

Global c10.s = Chr(10)
Global c13.s = Chr(13)
Global c32.s = Chr(32)
Global c34.s = Chr(34)
Global c39.s = Chr(39)

Macro EnsureThisNotStart(t,start)
  
  If Left(t,Len(start)) = start
      t = Mid(t,Len(start)+1,Len(t))
  EndIf
  
EndMacro

Procedure.s EnsureNotStart(t.s,start.s)
  
  EnsureThisNotStart(t,start)
  ProcedureReturn t
  
EndProcedure

Macro EnsureThisEnd(t,endd)
  
  If endd<>""
      If Right(t,Len(endd)) <> endd
          t+endd
      EndIf
  EndIf
  
EndMacro

Macro EnsureThisNotEnd(t,endd)
  
  If Right(t,Len(endd)) = endd
      ;snipped.s = Len(t)-Len(endd)
      ;t = Left(t,snipped)
      t = Left(t,Len(t)-Len(endd))
  EndIf
  
EndMacro

Macro StartsWith(main,sub)
  (sub<>"" And main<>"" And Left(main,Len(sub))=sub)
EndMacro


Procedure.s FileToString(filename.s)
  
  info.s = ""
  file = ReadFile(#PB_Any,filename)
  If file
      While Not Eof(file)
          info + ReadString(file)+c13
      Wend
      CloseFile(file)
  EndIf
  
  ProcedureReturn info
  
EndProcedure

Procedure.b FileFromString(filename.s,string.s)
  ;Report("FILE: "+filename+c13+"STRING: "+string)
  
  If Not filename : ProcedureReturn #False : EndIf
  ;If Not EnsureFolderPath(GetPathPart(filename)) : ProcedureReturn #False : EndIf
  
  string = RemoveString(string,c10)
  If FindString(string,c13,0)
      EnsureThisNotEnd(string,c13) ; this removes final linebreak
      string = ReplaceString(string,c13,c13+c10)
  EndIf
  
  file = CreateFile(#PB_Any,filename)
  If IsFile(file)
      WriteString(file,string)
      CloseFile(file)
      ProcedureReturn #True
  EndIf
  
  ProcedureReturn #False
  
EndProcedure


Procedure.s FileMimeType(filename.s)
  
  mime.s
  Select LCase(GetExtensionPart(filename))
      Case "png"
          mime = "image/png"
      Case "jpg", "jpeg", "jpe"
          mime = "image/jpeg"
      Case "ico"
          mime = "image/x-icon"
      Case "gif"
          mime = "image/gif"
      Case "bmp"
          mime = "image/bmp"
      Case "tif", "tiff"
          mime = "image/tiff"
  EndSelect
  
  ProcedureReturn mime
  
EndProcedure

Procedure.s File2Base64(filename.s)
  
  f = OpenFile(#PB_Any,filename)
  If Not f : ProcedureReturn "" : EndIf
  loaf = Lof(f)
  If loaf<1 : CloseFile(f) : ProcedureReturn "" : EndIf
  *mem = AllocateMemory(loaf)
  lengthread = ReadData(f,*mem,loaf)
  CloseFile(f)
  If Not lengthread
      FreeMemory(*mem)
      ProcedureReturn ""
  EndIf
  
  bloaf = loaf*1.5
  *b64 = AllocateMemory(bloaf)
  Base64Encoder(*mem,loaf,*b64,bloaf)
  
  b64.s = PeekS(*b64)
  FreeMemory(*mem)
  FreeMemory(*b64)
  
  ProcedureReturn b64
  
EndProcedure


Procedure.s OmitUnusedClasses(css.s,html.s)
  
  css+c13
  html = LCase(html)
  ncss.s = ""
  html = ReplaceString(html," class='"," class="+c34)
  html = ReplaceString(html," id='"," id="+c34)
  
  For a = 1 To CountString(css,c13)
      ln.s = StringField(css,a,c13)
      ln = Trim(ln)
      If Not ln : Continue : EndIf
      ln = RemoveString(ln,Chr(9))
      use.b = #False
      lln.s = LCase(ln)
      lln = ReplaceString(lln,"{",c32)
      lln = ReplaceString(lln,".",c32)
      lln = Trim(lln,c32)
      name.s = StringField(lln,1,c32)
      ;R(name)
      If Left(lln,1)="#"
          ; it's styling for an id
          inlinename.s = EnsureNotStart(name,"#")
          use = #False
          If FindString(html," id="+inlinename,1)
              use=#True
          Else
              If FindString(html," id="+c34+inlinename+c34,1)
                  use=#True
              Else
                  If FindString(html," id="+c39+inlinename+c39,1)
                      use=#True
                  EndIf
              EndIf
          EndIf
      Else
          ; it's a class
          If FindString(html," class="+name,1)
              use=#True
          Else
              If FindString(html," class="+c34+name+c34,1)
                  use=#True
              EndIf
          EndIf
      EndIf
      
      If use
          ncss+ln+c13
      EndIf
  Next a
  
  ProcedureReturn ncss
  
EndProcedure

Macro IsPointInsideAScriptBlock(lh,point,var,blockend)
  var = #False
  nextjsopen = FindString(lh,"<script ",detect)
  nextjsclose = FindString(lh,"</script>",detect)
  If nextjsclose>0
      If nextjsclose<nextjsopen Or Not nextjsopen
          ;Debug "INSIDE A SCRIPT BLOCK"
          var = #True
          blockend=nextjsclose
      EndIf
  EndIf
EndMacro




Macro NewBase(filename)
  basecount+1
  barrsize = ArraySize(base64s(),1)
  If barrsize<basecount
      barrsize+5
      ReDim base64s(barrsize)
  EndIf
  base64s(basecount) = "data:"+FileMimeType(filename)+";base64,"+File2Base64(filename)
EndMacro

Macro showtime(t)
  Debug RSet(Str(ElapsedMilliseconds()-st),5)+"  "+t
EndMacro

Macro B64Signal(b)
  "INSERTBASE64_"+Str(b)+"_"
EndMacro

Procedure.b InternaliseHTMLFile(sourcefile.s,destfile.s,webpath.s="",path.s="",omitunusedclasses.b=#True,lookforclassesinjavascript.b=#True,lookforimagesinjavascript.b=#True,extrafile_arr.s="")
  
  st=ElapsedMilliseconds()
  
  ; external stylesheets
  ; external javascripts
  ; images (converted to base64)
  ; background-images (converted to base64)
  ; svgs (converted to base64)
  
  pp.s = "|"
  Dim base64s.s(20)
  
  If path
      EnsureThisEnd(path,"\")
  Else
      path = GetPathPart(sourcefile)
  EndIf
  
  h.s = FileToString(sourcefile)
  
  lh.s = LCase(h)
  nh.s = h
  
  badtagarr.s
  
  importedcss.s
  
  
  
  ; find all external scripts and get their content...
  detect=0
  Repeat
      detect = FindString(lh,"<script ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      srcstart = FindString(snippet," src=",1)
      If srcstart
          snippet = Mid(snippet,srcstart+5,Len(snippet))
          ;snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          snippet = Trim(snippet,c34)
          ;EnsureThisNotStart(snippet,c34)
          ;EnsureThisNotEnd(snippet,c34)
          If FindString(snippet,c34,1)
              snippet = StringField(snippet,1,c34)
          EndIf
;          R(snippet)
          If snippet
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              newjs.s = FileToString(resourcefilename)
              If Not newjs : detect=enddetect : Continue : EndIf
              newjs = Trim(newjs,c13)
              tag.s = Mid(h,detect,enddetect-detect+1)
              If newjs
                  ntag.s = RemoveString(tag,snippet,#PB_String_NoCase)
                  ntag = RemoveString(ntag,c32+"src="+c34+c34)
                  ntag = RemoveString(ntag,c32+"src=")
                  ntag+c13+newjs+c13
                  nh = ReplaceString(nh,tag,ntag)
              Else
                  ; empty js file
                  nh = RemoveString(nh,tag+"</SCRIPT>",#PB_String_NoCase)
                  nh = RemoveString(nh,tag+" </SCRIPT>",#PB_String_NoCase)
                  nh = RemoveString(nh,tag+c13+"</SCRIPT>",#PB_String_NoCase)
              EndIf
          EndIf
      EndIf
      detect=enddetect
  ForEver
  
  
  
  ; find all external stylesheets and get their content...
  detect=0
  Repeat
      ; <LINK rel="stylesheet" type="text/css" href="styles.css" />
      detect = FindString(lh,"<link ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      If FindString(snippet,"stylesheet",1)
      srcstart = FindString(snippet," href=",1)
      If srcstart
          snippet = Mid(snippet,srcstart+5,Len(snippet))
          snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          EnsureThisNotStart(snippet,c34)
          EnsureThisNotEnd(snippet,c34)
          If snippet
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              newcss.s = FileToString(resourcefilename)
              newcss = Trim(newcss,c13)
              If newcss
                  importedcss+c13+c13+newcss
              EndIf
              badtagarr+ Mid(h,detect,enddetect-detect+1) +pp
          EndIf
      EndIf
      EndIf
      detect=enddetect
  ForEver
  If importedcss
      importedcss = Trim(importedcss,c13)
  EndIf
  
  
  
  ; find all external images and convert them to base64...
  detect=0
  Repeat
      ; <IMG src="tree.jpg" />
      detect = FindString(lh,"<img ",detect)
      If Not detect : Break : EndIf
      enddetect = FindString(lh,">",detect)
      If Not enddetect : Break : EndIf
      snippet.s = Mid(lh,detect,enddetect-detect)
      srcstart = FindString(snippet," src=",1)
      If srcstart
          ;snippet1.s=snippet
          snippet = Mid(snippet,srcstart+4,Len(snippet))
          snippet = RemoveString(snippet,"=")
          EnsureThisNotEnd(snippet,"/")
          snippet = Trim(snippet)
          ;EnsureThisNotStart(snippet,c34)
          ;EnsureThisNotEnd(snippet,c34)
          snippet=Trim(snippet,c39) : snippet=Trim(snippet,c34)
          ;snippet2.s = snippet
          If snippet
              If FindString(snippet,c34,1) : snippet=StringField(snippet,1,c34) : EndIf
              If FindString(snippet,c39,1) : snippet=StringField(snippet,1,c39) : EndIf
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              If FindString(resourcefilename,"?",1) : resourcefilename = StringField(resourcefilename,1,"?") : EndIf
              ;If FileSize(resourcefilename)<0
              ;    R("NO FILE."+c13+c13+snippet1+c13+snippet2+c13+snippet+c13+resourcefilename)
              ;EndIf
              NewBase(resourcefilename)
              tag.s = Mid(h,detect,enddetect-detect+1)
              ntag.s = ReplaceString(tag,snippet,B64Signal(basecount))
              nh = ReplaceString(nh,tag,ntag)
          EndIf
      EndIf
      detect=enddetect
  ForEver
  
  
  
  ; find all external SVGs and convert them to base64...
  If FindString(lh,".svg",1)
      detect=0
      Repeat
          ; <OBJECT data="arcrap.svg" type="image/svg+xml" width="100%" height="100%">    </OBJECT>
          detect = FindString(lh,"<object ",detect)
          If Not detect : Break : EndIf
          enddetect = FindString(lh,">",detect)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(lh,detect,enddetect-detect)
          srcstart = FindString(snippet," data=",1)
          If srcstart
              snippet = Mid(snippet,srcstart+5,Len(snippet))
              EnsureThisNotStart(snippet,"=")
              EnsureThisNotEnd(snippet,"/")
              EnsureThisNotStart(snippet,c34)
              If Left(snippet,1)=c34
                  EnsureThisNotStart(snippet,c34)
                  snippet = StringField(snippet,1,c34)
              Else
                  snippet = StringField(snippet,1,c32)
              EndIf
              snippet = Trim(snippet)
              EnsureThisNotEnd(snippet,c34)
              ;MessageRequester("SNIPPET","**"+snippet+"**")
              If snippet
                  resourcefilename.s = path+ReplaceString(snippet,"/","\")
                  ;MessageRequester("BASE 64",b64)
                  NewBase(resourcefilename)
                  tag.s = Mid(h,detect,enddetect-detect+1)
                  ntag.s = ReplaceString(tag,snippet,B64Signal(basecount))
                  nh = ReplaceString(nh,tag,ntag)
              EndIf
          EndIf
          detect=enddetect
      ForEver
  EndIf
  
  
  If badtagarr
      For a = 1 To CountString(badtagarr,pp)
          tag.s = StringField(badtagarr,a,pp)
          nh = RemoveString(nh,tag)
      Next a
  EndIf
  ;showtime("after badtagarr section")
  
  
  swb64.s = "SwitchBase64"
  If lookforimagesinjavascript
      attrname.s = "src"
      For cycle = 1 To 2
          elsrc.s = "."+attrname+"="
          nh = ReplaceString(nh,"."+attrname+" =",elsrc)
          nh = ReplaceString(nh,"."+attrname+"= ",elsrc)
          detect = 0
          enddetect = 0
          ;R("ELSRC: "+elsrc)
          Repeat
              lh.s = LCase(nh)
              detect = FindString(lh,elsrc,detect)
              If Not detect : Break : EndIf
              detect+Len(elsrc)
              enddetect = FindString(lh,";",detect)
              snippet.s = Mid(nh,detect,enddetect-detect)
              If StartsWith(snippet,swb64)
                  detect = enddetect
                  Continue
              EndIf
              ;R(snippet)
              newinstrux.s = elsrc+swb64+"("+snippet+")"
              ;R(elsrc+snippet+c13+c13+newinstrux)
              nh = ReplaceString(nh,elsrc+snippet,newinstrux,#PB_String_NoCase)
              nh = ReplaceString(nh,elsrc+" "+snippet,newinstrux,#PB_String_NoCase)
              detect = enddetect
              
              snippet = Trim(snippet,c34)
              snippet = Trim(snippet,c39)
              ;R(path+snippet)
              If FileSize(path+snippet)>0 ; it's a filename (otherwise it's a variable or a js function call)
                  ;R("FILE FOUND IN BASE FOLDER")
                  If Not FindString(pp+extrafile_arr,pp+snippet+pp,1) ; add it to the list if it's not already there
                      extrafile_arr+snippet+pp
                  EndIf
              EndIf
              
          ForEver
          attrname = "backgroundImage"
      Next cycle
  EndIf
  
  
  
  ncss.s = importedcss
  ; now get any internal style blocks...
  lh.s = LCase(nh)
  If FindString(lh,"<style",1)
      nh = ReplaceString(nh,"<style","<STYLE")
      nh = ReplaceString(nh,"</style>","</STYLE>")
      Repeat
          detect = FindString(nh,"<STYLE",0)
          If Not detect : Break : EndIf
          enddetect = FindString(nh,"</STYLE>",detect+6)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(nh,detect,enddetect-detect+8)
          nh = RemoveString(nh,snippet)
          EnsureThisNotEnd(snippet,"</STYLE>")
          detect = FindString(snippet,">",1)
          snippet = Mid(snippet,detect+1,Len(snippet))
          snippet = RemoveString(snippet,Chr(9))
          
          ncss+Trim(snippet,c13)+c13
      ForEver
  EndIf
  ;showtime("after getting style blocks")
  
  ; omit unused...
  If ncss
      If omitunusedclasses
          ncss = OmitUnusedClasses(ncss,nh)
      EndIf
  EndIf
  
  
  
  Debug(Str(Len(nh)))
  If webpath ; takes ages!
      EnsureThisEnd(webpath,"/")
      webpathl = Len(webpath)
      lh.s = LCase(nh)
      detect=FindString(lh,"<body",1)
      ;R(Mid(lh,detect,100))
      Repeat
          ; <A id="example" href="page2.html">text</A>
          detect = FindString(nh," href=",detect)
          If Not detect : Break : EndIf
          
          IsPointInsideAScriptBlock(lh,detect,inside,blockend)
          If inside
              ;R("INSIDE SCRIPT. BLOCK ENDS @ "+Str(blockend))
              detect = blockend
              Continue
          EndIf
          
          
          offset=7
          snippet = Mid(nh,detect+offset,webpathl+1)
          ;R(snippet)
          If Left(snippet,1)=c34
              offset=8
              snippet = Mid(snippet,detect+offset,webpathl+1)
          EndIf
          If StartsWith(snippet,"javascript:") Or StartsWith(snippet,"http://") Or StartsWith(snippet,"#") Or StartsWith(snippet,webpath)
              detect+webpathl
              Continue
          EndIf
          ;If Not snippet
          ;    detect+webpathl
          ;    Continue
          ;EndIf
          
          ;R(Mid(nh,detect+place-2,40))
          nh = InsertString(nh,webpath,detect+offset)
          
          detect+webpathl
      ForEver
  EndIf

  
  
  ; incorporate single css block...
  If ncss
      ncss = "<STYLE type="+c34+"text/css"+c34+">"+c13+c13+ncss+c13+c13+"</STYLE>"
      headend = FindString(nh,"<HEAD>",1)
      nh = InsertString(nh,c13+ncss,headend+7)
  EndIf
  
  
  
  ; find css images and base64 them...
  bi.s = "background-image:"
  If FindString(lh,bi,1)
      detect = 0
      Repeat
          ; { width:50px; background-image:url('graphics/example.jpg'); display:block; }"
          lh.s = LCase(nh)
          detect = FindString(lh,bi,detect)
          If Not detect : Break : EndIf
          detect+Len(bi)
          enddetect = FindString(nh,";",detect)
          If Not enddetect : Break : EndIf
          snippet.s = Mid(nh,detect,enddetect-detect)
          entiresnippet.s = snippet
          EnsureThisNotStart(snippet,"url(")
          EnsureThisNotEnd(snippet,")")
          snippet = Trim(snippet) : snippet=Trim(snippet,c39) : snippet=Trim(snippet,c34)
          If Left(snippet,5)="data:" : detect=enddetect : Continue : EndIf
          If snippet
              If FindString(snippet,c34,1) : snippet=StringField(snippet,1,c34) : EndIf
              If FindString(snippet,c39,1) : snippet=StringField(snippet,1,c39) : EndIf
              resourcefilename.s = path+ReplaceString(snippet,"/","\")
              ;R(resourcefilename)
              If FindString(resourcefilename,"?",1) : resourcefilename = StringField(resourcefilename,1,"?") : EndIf
              ;If FileSize(resourcefilename)<0
              ;    R("NO FILE."+c13+c13+snippet1+c13+snippet2+c13+snippet+c13+resourcefilename)
              ;EndIf
              NewBase(resourcefilename)
              ntag.s = ReplaceString(entiresnippet,snippet,B64Signal(basecount))
              nh = ReplaceString(nh,entiresnippet,ntag)
          EndIf
          detect = enddetect
      ForEver
  EndIf
  
  
  
  If extrafile_arr
      ;R("EXTRA FILE ARR"+c13+c13+ReplaceString(extrafile_arr,pp,c13))
      ; we found some discernible filenames in the js, and/or some supplied to the procedure
      ; have to encode them and construct the js switching procedure, SwitchBase64
      switcherfunc.s = "function SwitchBase64(filename) {"+c13
      ;switcherfunc + "alert(filename);"
      switcherfunc + "  switch(filename) {"+c13
      For a = 1 To CountString(extrafile_arr,pp)
          resfile.s = StringField(extrafile_arr,a,pp)
          EnsureThisNotStart(resfile,path)
          If Not resfile : Continue : EndIf
          NewBase(path+resfile)
          ;R(b64)
          switcherfunc + "    case "+c34+ReplaceString(resfile,"\","/")+c34+":"+c13
          switcherfunc + "      return "+c34+B64Signal(basecount)+c34+";"+c13
      Next a
      switcherfunc+c13+"    }"+c13+" }"
      
      ; now insert it right before the closing HEAD tag
      switcherfunc.s = "<SCRIPT type="+c34+"text/javascript"+c34+">"+c13+switcherfunc+c13+"</SCRIPT>"+c13
      ;nh = ReplaceString(nh,"</HEAD>","</HEAD>"+switcherfunc,#PB_String_NoCase)
      headend = FindString(nh,"</HEAD>",1)
      nh = InsertString(nh,switcherfunc,headend-1)
  EndIf
  
  
  
  detect=0
  For b = 1 To basecount
      marker.s = B64Signal(b)
      ;nh = ReplaceString(nh,marker,base64s(b)) : Continue
      ndetect = FindString(nh,marker,0)
      If ndetect
          detect=ndetect
          nh = ReplaceString(nh,marker,base64s(b),0,detect-2)
      EndIf
      ;detect = FindString(nh,marker,0)
      ;If detect
      ;    p1.s = Left(nh,detect-1)
      ;    p2.s = Mid(nh,detect+Len(marker),Len(nh))
      ;    nh = p1+base64s(b)+p2
      ;EndIf
  Next b
  
  
  
  FileFromString(destfile,nh)
  
EndProcedure
Last edited by Seymour Clufley on Tue Dec 10, 2019 12:45 am, edited 1 time in total.
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: "Internalise" an HTML file (external resources) [pleez h

Post by Seymour Clufley »

I've just realised that the code doesn't handle @font-face in CSS. That's another thing to be added!
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
Post Reply