how to view a CHM file?

Just starting out? Need help? Post your questions and find answers here.
AZJIO
Addict
Addict
Posts: 2141
Joined: Sun May 14, 2017 1:48 am

how to view a CHM file?

Post by AZJIO »

Is it possible to write your own CHM file viewer? In the properties of the 7zip archiver, the CHM file is displayed as LZX. I tried to open it using the UnLZX module, but it didn't work. The example archive in the same topic opens normally, but the 7zip program cannot open it.
If this worked, then I would be able to open the tree files and access the files to extract them and open them in the web gadget.
SMaag
Enthusiast
Enthusiast
Posts: 302
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: how to view a CHM file?

Post by SMaag »

generally YES!

CHM is a compiled HTML.
There is tool from Microsoft to uncompile the CHM. Then you get the HTML Folder/File Structure.
I did this with the PB help the reconstruct all the Files for automatically search all PB-Commands.

Use google and search for uncompile help files.

here is a link how to decompile:
https://zeropage.io/howto-decompile-win ... elp-files/

the needed hh.exe is part of the Microsoft HTML Help 1.4 SDK
SMaag
Enthusiast
Enthusiast
Posts: 302
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: how to view a CHM file?

Post by SMaag »

Now I found my file.

The command to decompile a help file is:
hh.exe -decompile outputfolder input.chm
One thing to note is that the decompile/recompile process isn't a "round-trip" process. Certain features that the help author added to the original help file can't be recovered when you decompile it, so these may no longer work properly after you've recompiled. This is especially true in the area of context-sensitive help, which may be broken in the new version of the file.
AZJIO
Addict
Addict
Posts: 2141
Joined: Sun May 14, 2017 1:48 am

Re: how to view a CHM file?

Post by AZJIO »

You can make sure that I can unpack and pack with different programs. But the task is to write a program for linux to eliminate the problems of the programs that I use to view the help file.
1. I don't like the serif font.
2. Does not remember the zoom.
3. The side mouse button does not work to return to the previous page.
These problems must be solved.
viewtopic.php?t=68549
SMaag
Enthusiast
Enthusiast
Posts: 302
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: how to view a CHM file?

Post by SMaag »

You can make sure that I can unpack and pack with different programs. But the task is to write a program for linux to eliminate the problems of the programs that I use to view the help file.
1. I don't like the serif font.
2. Does not remember the zoom.
3. The side mouse button does not work to return to the previous page.
These problems must be solved.
I know you are one of the best programmers here! And a question like "is it possible to write a progam in PB" is not a question on your level of experience. So, my conclusion was: you have problems with unpacking the CHM file because you described that is not possible with 7zip.

I'm sure it makes no sense to describe you how to serach for a font section in a html file and change the font name.

I think the 1st problem is: unpack .chm files on linux with PB Code
- but for unpacking .chm on Linux, I remember there are some open source projects. At the moment I don't remeber the names.

Can you describe your problem more detailed?
AZJIO
Addict
Addict
Posts: 2141
Joined: Sun May 14, 2017 1:48 am

Re: how to view a CHM file?

Post by AZJIO »

SMaag wrote: Mon Dec 30, 2024 12:14 pm because you described that is not possible with 7zip.
unable to open file in this forum post. This says that the LZX inside CHM is different from the LZX in the forum post. The CHM file opens in 7zip. Opening CHM in the 7zip program, I saw the archive type LZX.

I would not like to unpack files with an external program using the command line. I would like to use the UnLZX module to extract one file, find local href=file links in it to extract the missing css, png files into a temporary folder and open the html file in the web gadget. When closing the program, delete the unpacked files.

screenshot
dcr3
Enthusiast
Enthusiast
Posts: 181
Joined: Fri Aug 04, 2017 11:03 pm

Re: how to view a CHM file?

Post by dcr3 »

AZJIO wrote: Mon Dec 30, 2024 2:14 am but the 7zip program cannot open it.
7zip can decompile CHM files.
AZJIO
Addict
Addict
Posts: 2141
Joined: Sun May 14, 2017 1:48 am

Re: how to view a CHM file?

Post by AZJIO »

dcr3 wrote: Mon Dec 30, 2024 7:25 pm 7zip can decompile CHM files.
Your answers indicate that you do not understand what I write. I've been using 7zip to extract CHM for over 15 years.
AZJIO wrote: Mon Dec 30, 2024 4:54 pm The CHM file opens in 7zip. Opening CHM in the 7zip program, I saw the archive type LZX.
Did you really see something different from this phrase? I already tell it like it is for children, repeating the words more than 2 times so that it is not possible to interpret it in two ways, but you still see the opposite.
dcr3
Enthusiast
Enthusiast
Posts: 181
Joined: Fri Aug 04, 2017 11:03 pm

Re: how to view a CHM file?

Post by dcr3 »

AZJIO wrote: Mon Dec 30, 2024 7:34 pm Your answers indicate that you do not understand what I write. I've been using 7zip to extract CHM for over 15 years.
Right then,

https://github.com/Bioruebe/UniExtract2

https://github.com/Bioruebe/UniExtract2 ... actRC3.zip

The sources are available in AutoIt, one of your fav languages.
I am sure you can play around with it, and build the tool as per need. :?:
Jan2004
Enthusiast
Enthusiast
Posts: 163
Joined: Fri Jan 07, 2005 7:17 pm

Re: how to view a CHM file?

Post by Jan2004 »

A few words about the subject on stackoverflow.com:
https://stackoverflow.com/questions/692 ... -view-them
AZJIO
Addict
Addict
Posts: 2141
Joined: Sun May 14, 2017 1:48 am

Re: how to view a CHM file?

Post by AZJIO »

dcr3 wrote: Mon Dec 30, 2024 7:58 pm UniExtract2
When the author wrote this program, he actively communicated with me. This program does not contain unpacking modules; more precisely, it contains executable files and uses them to unpack archives. I initially wanted to have a built-in module. But it looks like I'll have to add a dependency to the 7zip binary in order to extract the files using it.
Jan2004 wrote: Mon Dec 30, 2024 8:41 pm A few words about the subject on stackoverflow.com:
Yes, I'm aware of this help information, although I just need the unpacking module, and I'm not interested in the binaries inside CHM.
Jan2004
Enthusiast
Enthusiast
Posts: 163
Joined: Fri Jan 07, 2005 7:17 pm

Re: how to view a CHM file?

Post by Jan2004 »

7zxa.dll - library for extracting from 7z archives:
https://sourceforge.net/projects/sevenz ... z/download
infratec
Always Here
Always Here
Posts: 7577
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: how to view a CHM file?

Post by infratec »

From Apache tika:
;The Header
;0000: char[4] 'ITSF'
;0004: DWORD 3 (Version number)
;0008: DWORD Total header length, including header section table And following Data.
;000C: DWORD 1 (unknown)
;0010: DWORD a timestamp
;0014: DWORD Windows Language ID
;0018: GUID {7C01FD10-7BAA-11D0-9E0C-00A0-C922-E6EC}
;0028: GUID {7C01FD11-7BAA-11D0-9E0C-00A0-C922-E6EC} Note: a GUID is $10 bytes, arranged As 1 DWORD, 2 WORDs, And 8 BYTEs.
;0000: QWORD Offset of section from beginning of file
;0008: QWORD Length of section Following the header section table is 8 bytes of additional header Data. In Version 2 files, this Data is Not there And the content section starts immediately after the directory.
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1


;Directory header The directory starts With a header; its format is as follows:
;0000: char[4] 'ITSP'
;0004: DWORD Version number 1
;0008: DWORD Length of the directory header
;000C: DWORD $0a (unknown)
;0010: DWORD $1000 Directory chunk size
;0014: DWORD "Density" of quickref section, usually 2
;0018: DWORD Depth of the index tree - 1 there is no index, 2 If there is one level of PMGI chunks
;001C: DWORD Chunk number of root index chunk, -1 If there is none (though at least one file has 0 despite there being no index chunk, probably a bug)
;0020: DWORD Chunk number of first PMGL (listing) chunk
;0024: DWORD Chunk number of last PMGL (listing) chunk
;0028: DWORD -1 (unknown)
;002C: DWORD Number of directory chunks (total)
;0030: DWORD Windows language ID
;0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC}
;0044: DWORD $54 (This is the length again)
;0048: DWORD -1 (unknown)
;004C: DWORD -1 (unknown)
;0050: DWORD -1 (unknown)
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1


;Description There are two types of directory chunks -- index chunks, And listing chunks.
;The index chunk will be omitted If there is only one listing chunk.
;A listing chunk has the following format:
;0000: char[4] 'PMGL' 0004: DWORD Length of free space And/Or quickref area at End of directory chunk
;0008: DWORD Always 0
;000C: DWORD Chunk number of previous listing chunk when reading directory in sequence (-1 If this is the first listing chunk)
;0010: DWORD Chunk number of Next listing chunk when reading directory in sequence (-1 If this is the last listing chunk)
;0014: Directory listing entries (To quickref area) Sorted by filename; the sort is case-insensitive The quickref area is written backwards from the end of the chunk. One quickref entry exists for every n entries in the file, where n is calculated as 1 + (1 << quickref density). So for density = 2, n = 5 Chunklen-0002: WORD Number of entries in the chunk Chunklen-0004: WORD Offset of entry n from entry 0 Chunklen-0008: WORD Offset of entry 2n from entry 0 Chunklen-000C: WORD Offset of entry 3n from entry 0 ... The format of a directory listing entry is as follows BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT: length The offset is from the beginning of the content section the file is in, after the section has been decompressed (if appropriate). The length also refers to length of the file in the section after decompression. There are two kinds of file represented in the directory: user data and format related files. The files which are format-related have names which begin with '::', the user data files have names which begin with "/".
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1


;Description Note: Not always exists An index chunk has the following format:
;0000: char[4] 'PMGI'
;0004: DWORD Length of quickref/free area at End of directory chunk
;0008: Directory index entries (To quickref/free area) The quickref area in an PMGI is the same As in an PMGL The format of a directory index entry is As follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: directory listing chunk which starts With name Encoded Integers aka ENCINT An ENCINT is a variable-length integer. The high bit of each byte indicates "continued to the next byte". Bytes are stored most significant To least significant. So, For example, $EA $15 is (((0xEA&0x7F)<<7)|0x15) = 0x3515.
;Note: This class is Not in use http://translated.by/you/microsoft-s-ht ... ion-form=1



;DataSpace/Storage//ControlData This file contains $20 bytes of information on the compression.
;The information is partially known:
;0000: DWORD 6 (unknown)
;0004: ASCII 'LZXC' Compression type identifier
;0008: DWORD 2 (Possibly numeric code For LZX)
;000C: DWORD The Huffman reset interval in $8000-byte blocks
;0010: DWORD The window size in $8000-byte blocks
;0014: DWORD unknown (sometimes 2, sometimes 1, sometimes 0)
;0018: DWORD 0 (unknown)
;001C: DWORD 0 (unknown)
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?page=2
infratec
Always Here
Always Here
Posts: 7577
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: how to view a CHM file?

Post by infratec »

infratec
Always Here
Always Here
Posts: 7577
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: how to view a CHM file?

Post by infratec »

Up to now (with PureBasic.chm)

Code: Select all


;
; http://www.russotto.net/chm/chmformat.html
;
EnableExplicit


Structure ITSF_Header_Structure
  Magic.a[4]
  Version.l
  TotalLength.l
  Unknown1.l
  Timestamp.l
  WindowsLanguageID.l
  GUID1.GUID
  GUID2.GUID
  OffsetOfSection.q
  LengthOfSection.q
EndStructure

Structure ITSP_Header_Structure
  Magic.a[4]
  Version.l
  Length.l
  Unknown1.l
  DirChunkSize.l
  Density.l
  DepthsIndexTree.l
  ChunkNumberOfRootIndexChunk.l
  ChunkNumberOfFirstPMGL.l
  ChunkNumberOfLastPMGL.l
  Unknown2.l
  NumberOfDirectoryChunks.l
  WindowsLanguageID.l
  GUID1.GUID
  Length2.l
  Unknown3.l
  Unknown4.l
  Unknown5.l
EndStructure

Structure PMGL_Header_Structure
  Magic.a[4]
  QuickRefLength.l
  Unknown1.l
  ChunkNumberOfPreviousListingChunk.l
  ChunkNumberOfNextListingChunk.l
EndStructure

Structure PMGI_Header_Structure
  Magic.a[4]
  QuickRefLength.l
EndStructure

Structure NameList_Structure
  Length.w
  NumberOfEntries.w
EndStructure


Structure ControlData_Structure
  NumberOfDWordsFollowingMagic.l
  Magic.a[4]
  Version.l
  LZXResetIntrval.l
  WindowSize.l
  CacheSize.l
  Unknown1.l
EndStructure

Structure SpanInfo_Structure
  UncompressedLength.q
EndStructure

Structure ResetTable_Structure
  Unkown1.l
  NumberOfEntries.l
  SizeOfTableEntry.l
  TableHeaderLength.l
  UncompressedLength.q
  CompressedLength.q
  BlockSize.q
  EntryNumber.q
  LocationFirstBlockBoundaryInUncompressedData.q
EndStructure


Procedure.i EncInt(*Ptr.Ascii, *Value.Quad)
  
  Protected Bytes.i, Result.q
  
  While *Ptr\a > $7F
    Bytes + 1
    Result << 7
    Result | (*Ptr\a & $7F)
    *Ptr + 1
  Wend
  Bytes + 1
  Result << 7
  Result | *Ptr\a
  
  *Value\q = Result
  
  ProcedureReturn Bytes
  
EndProcedure


Define.i File, NumberOfChunkEntries, i, Entries
Define.q QuadValue, Section, Offset, Length
Define Filename$, Name$
Define *FileBuffer, *Out, *Section0, *Content
Define *ITSF_Header.ITSF_Header_Structure
Define *ITSP_Header.ITSP_Header_Structure
Define *PMGL_Header.PMGL_Header_Structure
Define *PMGI_Header.PMGI_Header_Structure
Define *QuickRef, *Name, *Entry
Define *NameList.NameList_Structure

Define *ControlData.ControlData_Structure
Define *SpanInfo.SpanInfo_Structure
Define *ResetTable.ResetTable_Structure

Define.q ContentOffset, ContentLength, ControlDataOffset, ControlDataLength, SpanInfoOffset, SpanInfoLength, ResetTableOffset, ResetTableLength

UseLZMAPacker()



Filename$ = OpenFileRequester("Open a CHM file", "", "CHM|*.chm", 0)
If Filename$
  File = ReadFile(#PB_Any, Filename$)
  If File
    *FileBuffer = AllocateMemory(Lof(File), #PB_Memory_NoClear)
    If *FileBuffer
      If ReadData(File, *FileBuffer, MemorySize(*FileBuffer)) = MemorySize(*FileBuffer)
        *ITSF_Header = *FileBuffer
        
        Debug *ITSF_Header\Magic
        
        If *ITSF_Header\Magic[0] = 'I' And *ITSF_Header\Magic[1] = 'T' And *ITSF_Header\Magic[2] = 'S' And *ITSF_Header\Magic[3] = 'F'
          Debug "ITSF"
          Debug "Version: " + Str(*ITSF_Header\Version)
          Debug "TotalLength: " + Str(*ITSF_Header\TotalLength)
          Debug "Timestamp: " + FormatDate("%yyyy-%mm-%dd", *ITSF_Header\Timestamp)
          Debug "LanguageID: " + Str(*ITSF_Header\WindowsLanguageID)
          
          *ITSP_Header = *FileBuffer + *ITSF_Header\OffsetOfSection + *ITSF_Header\LengthOfSection
          If *ITSP_Header\Magic[0] = 'I' And *ITSP_Header\Magic[1] = 'T' And *ITSP_Header\Magic[2] = 'S' And *ITSP_Header\Magic[3] = 'P'
            Debug "ITSP"
            Debug "DirChunkSize: " + Str(*ITSP_Header\DirChunkSize)
            Debug "Density: " + Str(*ITSP_Header\Density)
            Debug "ChunkNumberOfFirstPMGL: " + Str(*ITSP_Header\ChunkNumberOfFirstPMGL)
            Debug "ChunkNumberOfLastPMGL: " + Str(*ITSP_Header\ChunkNumberOfLastPMGL)
            
            
            *PMGL_Header = *ITSP_Header + *ITSP_Header\Length
            *PMGI_Header = *PMGL_Header
            
            If *PMGL_Header\Magic[0] = 'P' And *PMGL_Header\Magic[1] = 'M' And *PMGL_Header\Magic[2] = 'G' And *PMGL_Header\Magic[3] = 'L'
              
              For Entries = *ITSP_Header\ChunkNumberOfFirstPMGL To *ITSP_Header\ChunkNumberOfLastPMGL
                
                If *PMGL_Header\Magic[0] = 'P' And *PMGL_Header\Magic[1] = 'M' And *PMGL_Header\Magic[2] = 'G' And *PMGL_Header\Magic[3] = 'L'
                  Debug "PMGL"
                  Debug "PMGL QuickRefLength: " + Str(*PMGL_Header\QuickRefLength)
                  
                  *QuickRef = *PMGL_Header + *ITSP_Header\DirChunkSize
                  NumberOfChunkEntries = PeekW(*QuickRef - 2)
                  Debug "NuberOfChunkEntries: " + Str(NumberOfChunkEntries)
                  
                  *Entry = *PMGL_Header + SizeOf(PMGL_Header_Structure)
                  For i = 1 To NumberOfChunkEntries
                    *Entry + EncInt(*Entry, @QuadValue)
                    Name$ = PeekS(*Entry, QuadValue, #PB_UTF8|#PB_ByteLength)
                    Debug Name$
                    *Entry + QuadValue
                    
                    *Entry + EncInt(*Entry, @QuadValue)
                    Section = QuadValue
                    Debug "content section: " + Str(Section)
                    *Entry + EncInt(*Entry, @QuadValue)
                    Offset = QuadValue
                    Debug "offset: " + Str(Offset)
                    *Entry + EncInt(*Entry, @QuadValue)
                    Length = QuadValue
                    Debug "length: " + Str(Length)
                    
                    If Section = 0 And Left(Name$, 20) = "::DataSpace/Storage/"
                      
                      If FindString(Name$, "/Content", 20)
                        ContentOffset = Offset
                        ContentLength = Length
                      EndIf
                      
                      If FindString(Name$, "/ControlData", 20)
                        ControlDataOffset = Offset
                        ControlDataLength = Length
                      EndIf
                      
                      If FindString(Name$, "/SpanInfo", 20)
                        SpanInfoOffset = Offset
                        SpanInfoLength = Length
                      EndIf
                      
                      If FindString(Name$, "/ResetTable", 20)
                        ResetTableOffset = Offset
                        ResetTableLength = Length
                      EndIf
                      
                    EndIf
                    
                  Next i
                  
                EndIf
                *PMGL_Header + *ITSP_Header\DirChunkSize
                
              Next
              ;Until *PMGL_Header\ChunkNumberOfNextListingChunk = -1
              
              *PMGI_Header = *PMGL_Header + *ITSP_Header\DirChunkSize
              *NameList = *PMGI_Header
            EndIf
            
            If *ITSP_Header\DepthsIndexTree > 1
              
              If *PMGI_Header\Magic[0] = 'P' And *PMGI_Header\Magic[1] = 'M' And *PMGI_Header\Magic[2] = 'G' And *PMGI_Header\Magic[3] = 'I'
                Debug "PMGI QuickRefLength: " + Str(*PMGL_Header\QuickRefLength)
                
                *QuickRef = *PMGI_Header + *ITSP_Header\DirChunkSize
                NumberOfChunkEntries = PeekW(*QuickRef - 2)
                Debug "NuberOfChunkEntries: " + Str(NumberOfChunkEntries)
                
                *Entry = *PMGI_Header + SizeOf(PMGI_Header_Structure)
                For i = 1 To NumberOfChunkEntries
                  *Entry + EncInt(*Entry, @QuadValue)
                  Debug PeekS(*Entry, QuadValue, #PB_UTF8|#PB_ByteLength)
                  *Entry + QuadValue
                  
                  *Entry + EncInt(*Entry, @QuadValue)
                  Debug "Starts with name: " + Str(QuadValue)
                  
                Next i
                
                *NameList = *PMGI_Header + *ITSP_Header\DirChunkSize
                
              EndIf
              
              
            EndIf
            
            *Section0 = *NameList
            *Content = *Section0 + ContentOffset
            *ControlData = *Section0 + ControlDataOffset
            *SpanInfo = *Section0 + SpanInfoOffset
            *ResetTable = *Section0 + ResetTableOffset
            
            If *NameList\Length > 0 And *NameList\NumberOfEntries = 2
              
              *Name = *NameList + SizeOf(NameList_Structure)
              For i = 1 To *NameList\NumberOfEntries
                *Name + 2
                Select PeekS(*Name)
                  Case "Uncompressed"
                    Debug "Uncompressed"
                    *Name + (PeekW(*Name - 2) + 1) * 2
                  Case "MSCompressed"
                    Debug "MSCompressed"
                    *Name + (PeekW(*Name - 2) + 1) * 2
                EndSelect
              Next i
              ;               If *LZXC_Header\Magic[0] = 'L' And *LZXC_Header\Magic[1] = 'Z' And *LZXC_Header\Magic[2] = 'X' And *LZXC_Header\Magic[3] = 'C'
              ;                 Debug "LZXC"
              ;               EndIf
              
            EndIf
            
          EndIf
          
          If PeekS(@*ControlData\Magic[0], 4, #PB_Ascii) = "LZXC"
            
            Debug "SpanInfo\UncompressedLength: " + Str(*SpanInfo\UncompressedLength)
            
            Debug "ResetTable\BlockSize: " + Str(*ResetTable\BlockSize)
            
            Debug "ResetTable\CompressedLength: " + Str(*ResetTable\CompressedLength)
            Debug "ResetTable\UncompressedLength: " + Str(*ResetTable\UncompressedLength)
            
            ShowMemoryViewer(*Content, 100)
            
            
            ; LZX decoding !!!
            ; http://www.jedrea.com/chmlib/
            
            *Out = AllocateMemory($40000, #PB_Memory_NoClear)
            i = 1
            Repeat
              Length = UncompressMemory(*Content, $FFFF, *Out, MemorySize(*Out), #PB_PackerPlugin_Lzma)
              If Length > 0
                Debug "i: " + Str(i)
                ShowMemoryViewer(*Out, 1000)
              EndIf
              i + 1
            Until i = $8001
            
          EndIf
          
        EndIf
      EndIf
      FreeMemory(*FileBuffer)
    EndIf
    CloseFile(File)
  EndIf
EndIf
Last edited by infratec on Thu Jan 02, 2025 7:10 pm, edited 1 time in total.
Post Reply