how to view a CHM file?

AZJIO · Post by **AZJIO** » Mon Dec 30, 2024 2:14 am

Is it possible to write your own CHM file viewer? In the properties of the 7zip archiver, the CHM file is displayed as LZX. I tried to open it using the UnLZX module, but it didn't work. The example archive in the same topic opens normally, but the 7zip program cannot open it.
If this worked, then I would be able to open the tree files and access the files to extract them and open them in the web gadget.

SMaag · Post by **SMaag** » Mon Dec 30, 2024 9:20 am

generally YES!

CHM is a compiled HTML.
There is tool from Microsoft to uncompile the CHM. Then you get the HTML Folder/File Structure.
I did this with the PB help the reconstruct all the Files for automatically search all PB-Commands.

Use google and search for uncompile help files.

here is a link how to decompile:
https://zeropage.io/howto-decompile-win ... elp-files/

the needed hh.exe is part of the Microsoft HTML Help 1.4 SDK

SMaag · Post by **SMaag** » Mon Dec 30, 2024 9:57 am

Now I found my file.

The command to decompile a help file is:
hh.exe -decompile outputfolder input.chm

One thing to note is that the decompile/recompile process isn't a "round-trip" process. Certain features that the help author added to the original help file can't be recovered when you decompile it, so these may no longer work properly after you've recompiled. This is especially true in the area of context-sensitive help, which may be broken in the new version of the file.

AZJIO · Post by **AZJIO** » Mon Dec 30, 2024 10:49 am

You can make sure that I can unpack and pack with different programs. But the task is to write a program for linux to eliminate the problems of the programs that I use to view the help file.
1. I don't like the serif font.
2. Does not remember the zoom.
3. The side mouse button does not work to return to the previous page.
These problems must be solved.
viewtopic.php?t=68549

SMaag · Post by **SMaag** » Mon Dec 30, 2024 12:14 pm

You can make sure that I can unpack and pack with different programs. But the task is to write a program for linux to eliminate the problems of the programs that I use to view the help file.
1. I don't like the serif font.
2. Does not remember the zoom.
3. The side mouse button does not work to return to the previous page.
These problems must be solved.

I know you are one of the best programmers here! And a question like "is it possible to write a progam in PB" is not a question on your level of experience. So, my conclusion was: you have problems with unpacking the CHM file because you described that is not possible with 7zip.

I'm sure it makes no sense to describe you how to serach for a font section in a html file and change the font name.

I think the 1st problem is: unpack .chm files on linux with PB Code
- but for unpacking .chm on Linux, I remember there are some open source projects. At the moment I don't remeber the names.

Can you describe your problem more detailed?

AZJIO · Post by **AZJIO** » Mon Dec 30, 2024 4:54 pm

SMaag wrote: Mon Dec 30, 2024 12:14 pm because you described that is not possible with 7zip.

unable to open file in this forum post. This says that the LZX inside CHM is different from the LZX in the forum post. The CHM file opens in 7zip. Opening CHM in the 7zip program, I saw the archive type LZX.

I would not like to unpack files with an external program using the command line. I would like to use the UnLZX module to extract one file, find local href=file links in it to extract the missing css, png files into a temporary folder and open the html file in the web gadget. When closing the program, delete the unpacked files.

screenshot

dcr3 · Post by **dcr3** » Mon Dec 30, 2024 7:25 pm

AZJIO wrote: Mon Dec 30, 2024 2:14 am but the 7zip program cannot open it.

7zip can decompile CHM files.

AZJIO · Post by **AZJIO** » Mon Dec 30, 2024 7:34 pm

dcr3 wrote: Mon Dec 30, 2024 7:25 pm 7zip can decompile CHM files.

Your answers indicate that you do not understand what I write. I've been using 7zip to extract CHM for over 15 years.

AZJIO wrote: Mon Dec 30, 2024 4:54 pm The CHM file opens in 7zip. Opening CHM in the 7zip program, I saw the archive type LZX.

Did you really see something different from this phrase? I already tell it like it is for children, repeating the words more than 2 times so that it is not possible to interpret it in two ways, but you still see the opposite.

dcr3 · Post by **dcr3** » Mon Dec 30, 2024 7:58 pm

AZJIO wrote: Mon Dec 30, 2024 7:34 pm Your answers indicate that you do not understand what I write. I've been using 7zip to extract CHM for over 15 years.

Right then,

https://github.com/Bioruebe/UniExtract2

https://github.com/Bioruebe/UniExtract2 ... actRC3.zip

The sources are available in AutoIt, one of your fav languages.
I am sure you can play around with it, and build the tool as per need.

Jan2004 · Post by **Jan2004** » Mon Dec 30, 2024 8:41 pm

A few words about the subject on stackoverflow.com:
https://stackoverflow.com/questions/692 ... -view-them

AZJIO · Post by **AZJIO** » Tue Dec 31, 2024 9:22 am

dcr3 wrote: Mon Dec 30, 2024 7:58 pm UniExtract2

When the author wrote this program, he actively communicated with me. This program does not contain unpacking modules; more precisely, it contains executable files and uses them to unpack archives. I initially wanted to have a built-in module. But it looks like I'll have to add a dependency to the 7zip binary in order to extract the files using it.

Jan2004 wrote: Mon Dec 30, 2024 8:41 pm A few words about the subject on stackoverflow.com:

Yes, I'm aware of this help information, although I just need the unpacking module, and I'm not interested in the binaries inside CHM.

Jan2004 · Post by **Jan2004** » Tue Dec 31, 2024 11:08 am

7zxa.dll - library for extracting from 7z archives:
https://sourceforge.net/projects/sevenz ... z/download

infratec · Post by **infratec** » Tue Dec 31, 2024 12:34 pm

From Apache tika:

;The Header
;0000: char[4] 'ITSF'
;0004: DWORD 3 (Version number)
;0008: DWORD Total header length, including header section table And following Data.
;000C: DWORD 1 (unknown)
;0010: DWORD a timestamp
;0014: DWORD Windows Language ID
;0018: GUID {7C01FD10-7BAA-11D0-9E0C-00A0-C922-E6EC}
;0028: GUID {7C01FD11-7BAA-11D0-9E0C-00A0-C922-E6EC} Note: a GUID is $10 bytes, arranged As 1 DWORD, 2 WORDs, And 8 BYTEs.
;0000: QWORD Offset of section from beginning of file
;0008: QWORD Length of section Following the header section table is 8 bytes of additional header Data. In Version 2 files, this Data is Not there And the content section starts immediately after the directory.
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1

;Directory header The directory starts With a header; its format is as follows:
;0000: char[4] 'ITSP'
;0004: DWORD Version number 1
;0008: DWORD Length of the directory header
;000C: DWORD $0a (unknown)
;0010: DWORD $1000 Directory chunk size
;0014: DWORD "Density" of quickref section, usually 2
;0018: DWORD Depth of the index tree - 1 there is no index, 2 If there is one level of PMGI chunks
;001C: DWORD Chunk number of root index chunk, -1 If there is none (though at least one file has 0 despite there being no index chunk, probably a bug)
;0020: DWORD Chunk number of first PMGL (listing) chunk
;0024: DWORD Chunk number of last PMGL (listing) chunk
;0028: DWORD -1 (unknown)
;002C: DWORD Number of directory chunks (total)
;0030: DWORD Windows language ID
;0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC}
;0044: DWORD $54 (This is the length again)
;0048: DWORD -1 (unknown)
;004C: DWORD -1 (unknown)
;0050: DWORD -1 (unknown)
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1

;Description There are two types of directory chunks -- index chunks, And listing chunks.
;The index chunk will be omitted If there is only one listing chunk.
;A listing chunk has the following format:
;0000: char[4] 'PMGL' 0004: DWORD Length of free space And/Or quickref area at End of directory chunk
;0008: DWORD Always 0
;000C: DWORD Chunk number of previous listing chunk when reading directory in sequence (-1 If this is the first listing chunk)
;0010: DWORD Chunk number of Next listing chunk when reading directory in sequence (-1 If this is the last listing chunk)
;0014: Directory listing entries (To quickref area) Sorted by filename; the sort is case-insensitive The quickref area is written backwards from the end of the chunk. One quickref entry exists for every n entries in the file, where n is calculated as 1 + (1 << quickref density). So for density = 2, n = 5 Chunklen-0002: WORD Number of entries in the chunk Chunklen-0004: WORD Offset of entry n from entry 0 Chunklen-0008: WORD Offset of entry 2n from entry 0 Chunklen-000C: WORD Offset of entry 3n from entry 0 ... The format of a directory listing entry is as follows BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT: length The offset is from the beginning of the content section the file is in, after the section has been decompressed (if appropriate). The length also refers to length of the file in the section after decompression. There are two kinds of file represented in the directory: user data and format related files. The files which are format-related have names which begin with '::', the user data files have names which begin with "/".
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1

;Description Note: Not always exists An index chunk has the following format:
;0000: char[4] 'PMGI'
;0004: DWORD Length of quickref/free area at End of directory chunk
;0008: Directory index entries (To quickref/free area) The quickref area in an PMGI is the same As in an PMGL The format of a directory index entry is As follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: directory listing chunk which starts With name Encoded Integers aka ENCINT An ENCINT is a variable-length integer. The high bit of each byte indicates "continued to the next byte". Bytes are stored most significant To least significant. So, For example, $EA $15 is (((0xEA&0x7F)<<7)|0x15) = 0x3515.
;Note: This class is Not in use http://translated.by/you/microsoft-s-ht ... ion-form=1

;DataSpace/Storage//ControlData This file contains $20 bytes of information on the compression.
;The information is partially known:
;0000: DWORD 6 (unknown)
;0004: ASCII 'LZXC' Compression type identifier
;0008: DWORD 2 (Possibly numeric code For LZX)
;000C: DWORD The Huffman reset interval in $8000-byte blocks
;0010: DWORD The window size in $8000-byte blocks
;0014: DWORD unknown (sometimes 2, sometimes 1, sometimes 0)
;0018: DWORD 0 (unknown)
;001C: DWORD 0 (unknown)
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?page=2

infratec · Post by **infratec** » Tue Dec 31, 2024 1:33 pm

Better:
http://www.russotto.net/chm/chmformat.html

infratec · Post by **infratec** » Tue Dec 31, 2024 5:57 pm

Up to now (with PureBasic.chm)

Code: Select all


;
; http://www.russotto.net/chm/chmformat.html
;
EnableExplicit


Structure ITSF_Header_Structure
  Magic.a[4]
  Version.l
  TotalLength.l
  Unknown1.l
  Timestamp.l
  WindowsLanguageID.l
  GUID1.GUID
  GUID2.GUID
  OffsetOfSection.q
  LengthOfSection.q
EndStructure

Structure ITSP_Header_Structure
  Magic.a[4]
  Version.l
  Length.l
  Unknown1.l
  DirChunkSize.l
  Density.l
  DepthsIndexTree.l
  ChunkNumberOfRootIndexChunk.l
  ChunkNumberOfFirstPMGL.l
  ChunkNumberOfLastPMGL.l
  Unknown2.l
  NumberOfDirectoryChunks.l
  WindowsLanguageID.l
  GUID1.GUID
  Length2.l
  Unknown3.l
  Unknown4.l
  Unknown5.l
EndStructure

Structure PMGL_Header_Structure
  Magic.a[4]
  QuickRefLength.l
  Unknown1.l
  ChunkNumberOfPreviousListingChunk.l
  ChunkNumberOfNextListingChunk.l
EndStructure

Structure PMGI_Header_Structure
  Magic.a[4]
  QuickRefLength.l
EndStructure

Structure NameList_Structure
  Length.w
  NumberOfEntries.w
EndStructure


Structure ControlData_Structure
  NumberOfDWordsFollowingMagic.l
  Magic.a[4]
  Version.l
  LZXResetIntrval.l
  WindowSize.l
  CacheSize.l
  Unknown1.l
EndStructure

Structure SpanInfo_Structure
  UncompressedLength.q
EndStructure

Structure ResetTable_Structure
  Unkown1.l
  NumberOfEntries.l
  SizeOfTableEntry.l
  TableHeaderLength.l
  UncompressedLength.q
  CompressedLength.q
  BlockSize.q
  EntryNumber.q
  LocationFirstBlockBoundaryInUncompressedData.q
EndStructure


Procedure.i EncInt(*Ptr.Ascii, *Value.Quad)
  
  Protected Bytes.i, Result.q
  
  While *Ptr\a > $7F
    Bytes + 1
    Result << 7
    Result | (*Ptr\a & $7F)
    *Ptr + 1
  Wend
  Bytes + 1
  Result << 7
  Result | *Ptr\a
  
  *Value\q = Result
  
  ProcedureReturn Bytes
  
EndProcedure


Define.i File, NumberOfChunkEntries, i, Entries
Define.q QuadValue, Section, Offset, Length
Define Filename$, Name$
Define *FileBuffer, *Out, *Section0, *Content
Define *ITSF_Header.ITSF_Header_Structure
Define *ITSP_Header.ITSP_Header_Structure
Define *PMGL_Header.PMGL_Header_Structure
Define *PMGI_Header.PMGI_Header_Structure
Define *QuickRef, *Name, *Entry
Define *NameList.NameList_Structure

Define *ControlData.ControlData_Structure
Define *SpanInfo.SpanInfo_Structure
Define *ResetTable.ResetTable_Structure

Define.q ContentOffset, ContentLength, ControlDataOffset, ControlDataLength, SpanInfoOffset, SpanInfoLength, ResetTableOffset, ResetTableLength

UseLZMAPacker()



Filename$ = OpenFileRequester("Open a CHM file", "", "CHM|*.chm", 0)
If Filename$
  File = ReadFile(#PB_Any, Filename$)
  If File
    *FileBuffer = AllocateMemory(Lof(File), #PB_Memory_NoClear)
    If *FileBuffer
      If ReadData(File, *FileBuffer, MemorySize(*FileBuffer)) = MemorySize(*FileBuffer)
        *ITSF_Header = *FileBuffer
        
        Debug *ITSF_Header\Magic
        
        If *ITSF_Header\Magic[0] = 'I' And *ITSF_Header\Magic[1] = 'T' And *ITSF_Header\Magic[2] = 'S' And *ITSF_Header\Magic[3] = 'F'
          Debug "ITSF"
          Debug "Version: " + Str(*ITSF_Header\Version)
          Debug "TotalLength: " + Str(*ITSF_Header\TotalLength)
          Debug "Timestamp: " + FormatDate("%yyyy-%mm-%dd", *ITSF_Header\Timestamp)
          Debug "LanguageID: " + Str(*ITSF_Header\WindowsLanguageID)
          
          *ITSP_Header = *FileBuffer + *ITSF_Header\OffsetOfSection + *ITSF_Header\LengthOfSection
          If *ITSP_Header\Magic[0] = 'I' And *ITSP_Header\Magic[1] = 'T' And *ITSP_Header\Magic[2] = 'S' And *ITSP_Header\Magic[3] = 'P'
            Debug "ITSP"
            Debug "DirChunkSize: " + Str(*ITSP_Header\DirChunkSize)
            Debug "Density: " + Str(*ITSP_Header\Density)
            Debug "ChunkNumberOfFirstPMGL: " + Str(*ITSP_Header\ChunkNumberOfFirstPMGL)
            Debug "ChunkNumberOfLastPMGL: " + Str(*ITSP_Header\ChunkNumberOfLastPMGL)
            
            
            *PMGL_Header = *ITSP_Header + *ITSP_Header\Length
            *PMGI_Header = *PMGL_Header
            
            If *PMGL_Header\Magic[0] = 'P' And *PMGL_Header\Magic[1] = 'M' And *PMGL_Header\Magic[2] = 'G' And *PMGL_Header\Magic[3] = 'L'
              
              For Entries = *ITSP_Header\ChunkNumberOfFirstPMGL To *ITSP_Header\ChunkNumberOfLastPMGL
                
                If *PMGL_Header\Magic[0] = 'P' And *PMGL_Header\Magic[1] = 'M' And *PMGL_Header\Magic[2] = 'G' And *PMGL_Header\Magic[3] = 'L'
                  Debug "PMGL"
                  Debug "PMGL QuickRefLength: " + Str(*PMGL_Header\QuickRefLength)
                  
                  *QuickRef = *PMGL_Header + *ITSP_Header\DirChunkSize
                  NumberOfChunkEntries = PeekW(*QuickRef - 2)
                  Debug "NuberOfChunkEntries: " + Str(NumberOfChunkEntries)
                  
                  *Entry = *PMGL_Header + SizeOf(PMGL_Header_Structure)
                  For i = 1 To NumberOfChunkEntries
                    *Entry + EncInt(*Entry, @QuadValue)
                    Name$ = PeekS(*Entry, QuadValue, #PB_UTF8|#PB_ByteLength)
                    Debug Name$
                    *Entry + QuadValue
                    
                    *Entry + EncInt(*Entry, @QuadValue)
                    Section = QuadValue
                    Debug "content section: " + Str(Section)
                    *Entry + EncInt(*Entry, @QuadValue)
                    Offset = QuadValue
                    Debug "offset: " + Str(Offset)
                    *Entry + EncInt(*Entry, @QuadValue)
                    Length = QuadValue
                    Debug "length: " + Str(Length)
                    
                    If Section = 0 And Left(Name$, 20) = "::DataSpace/Storage/"
                      
                      If FindString(Name$, "/Content", 20)
                        ContentOffset = Offset
                        ContentLength = Length
                      EndIf
                      
                      If FindString(Name$, "/ControlData", 20)
                        ControlDataOffset = Offset
                        ControlDataLength = Length
                      EndIf
                      
                      If FindString(Name$, "/SpanInfo", 20)
                        SpanInfoOffset = Offset
                        SpanInfoLength = Length
                      EndIf
                      
                      If FindString(Name$, "/ResetTable", 20)
                        ResetTableOffset = Offset
                        ResetTableLength = Length
                      EndIf
                      
                    EndIf
                    
                  Next i
                  
                EndIf
                *PMGL_Header + *ITSP_Header\DirChunkSize
                
              Next
              ;Until *PMGL_Header\ChunkNumberOfNextListingChunk = -1
              
              *PMGI_Header = *PMGL_Header + *ITSP_Header\DirChunkSize
              *NameList = *PMGI_Header
            EndIf
            
            If *ITSP_Header\DepthsIndexTree > 1
              
              If *PMGI_Header\Magic[0] = 'P' And *PMGI_Header\Magic[1] = 'M' And *PMGI_Header\Magic[2] = 'G' And *PMGI_Header\Magic[3] = 'I'
                Debug "PMGI QuickRefLength: " + Str(*PMGL_Header\QuickRefLength)
                
                *QuickRef = *PMGI_Header + *ITSP_Header\DirChunkSize
                NumberOfChunkEntries = PeekW(*QuickRef - 2)
                Debug "NuberOfChunkEntries: " + Str(NumberOfChunkEntries)
                
                *Entry = *PMGI_Header + SizeOf(PMGI_Header_Structure)
                For i = 1 To NumberOfChunkEntries
                  *Entry + EncInt(*Entry, @QuadValue)
                  Debug PeekS(*Entry, QuadValue, #PB_UTF8|#PB_ByteLength)
                  *Entry + QuadValue
                  
                  *Entry + EncInt(*Entry, @QuadValue)
                  Debug "Starts with name: " + Str(QuadValue)
                  
                Next i
                
                *NameList = *PMGI_Header + *ITSP_Header\DirChunkSize
                
              EndIf
              
              
            EndIf
            
            *Section0 = *NameList
            *Content = *Section0 + ContentOffset
            *ControlData = *Section0 + ControlDataOffset
            *SpanInfo = *Section0 + SpanInfoOffset
            *ResetTable = *Section0 + ResetTableOffset
            
            If *NameList\Length > 0 And *NameList\NumberOfEntries = 2
              
              *Name = *NameList + SizeOf(NameList_Structure)
              For i = 1 To *NameList\NumberOfEntries
                *Name + 2
                Select PeekS(*Name)
                  Case "Uncompressed"
                    Debug "Uncompressed"
                    *Name + (PeekW(*Name - 2) + 1) * 2
                  Case "MSCompressed"
                    Debug "MSCompressed"
                    *Name + (PeekW(*Name - 2) + 1) * 2
                EndSelect
              Next i
              ;               If *LZXC_Header\Magic[0] = 'L' And *LZXC_Header\Magic[1] = 'Z' And *LZXC_Header\Magic[2] = 'X' And *LZXC_Header\Magic[3] = 'C'
              ;                 Debug "LZXC"
              ;               EndIf
              
            EndIf
            
          EndIf
          
          If PeekS(@*ControlData\Magic[0], 4, #PB_Ascii) = "LZXC"
            
            Debug "SpanInfo\UncompressedLength: " + Str(*SpanInfo\UncompressedLength)
            
            Debug "ResetTable\BlockSize: " + Str(*ResetTable\BlockSize)
            
            Debug "ResetTable\CompressedLength: " + Str(*ResetTable\CompressedLength)
            Debug "ResetTable\UncompressedLength: " + Str(*ResetTable\UncompressedLength)
            
            ShowMemoryViewer(*Content, 100)
            
            
            ; LZX decoding !!!
            ; http://www.jedrea.com/chmlib/
            
            *Out = AllocateMemory($40000, #PB_Memory_NoClear)
            i = 1
            Repeat
              Length = UncompressMemory(*Content, $FFFF, *Out, MemorySize(*Out), #PB_PackerPlugin_Lzma)
              If Length > 0
                Debug "i: " + Str(i)
                ShowMemoryViewer(*Out, 1000)
              EndIf
              i + 1
            Until i = $8001
            
          EndIf
          
        EndIf
      EndIf
      FreeMemory(*FileBuffer)
    EndIf
    CloseFile(File)
  EndIf
EndIf

PureBasic Forums - English

how to view a CHM file?

how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?

Re: how to view a CHM file?