how to view a CHM file?
how to view a CHM file?
Is it possible to write your own CHM file viewer? In the properties of the 7zip archiver, the CHM file is displayed as LZX. I tried to open it using the UnLZX module, but it didn't work. The example archive in the same topic opens normally, but the 7zip program cannot open it.
If this worked, then I would be able to open the tree files and access the files to extract them and open them in the web gadget.
If this worked, then I would be able to open the tree files and access the files to extract them and open them in the web gadget.
Re: how to view a CHM file?
generally YES!
CHM is a compiled HTML.
There is tool from Microsoft to uncompile the CHM. Then you get the HTML Folder/File Structure.
I did this with the PB help the reconstruct all the Files for automatically search all PB-Commands.
Use google and search for uncompile help files.
here is a link how to decompile:
https://zeropage.io/howto-decompile-win ... elp-files/
the needed hh.exe is part of the Microsoft HTML Help 1.4 SDK
CHM is a compiled HTML.
There is tool from Microsoft to uncompile the CHM. Then you get the HTML Folder/File Structure.
I did this with the PB help the reconstruct all the Files for automatically search all PB-Commands.
Use google and search for uncompile help files.
here is a link how to decompile:
https://zeropage.io/howto-decompile-win ... elp-files/
the needed hh.exe is part of the Microsoft HTML Help 1.4 SDK
Re: how to view a CHM file?
Now I found my file.
The command to decompile a help file is:
hh.exe -decompile outputfolder input.chm
The command to decompile a help file is:
hh.exe -decompile outputfolder input.chm
One thing to note is that the decompile/recompile process isn't a "round-trip" process. Certain features that the help author added to the original help file can't be recovered when you decompile it, so these may no longer work properly after you've recompiled. This is especially true in the area of context-sensitive help, which may be broken in the new version of the file.
Re: how to view a CHM file?
You can make sure that I can unpack and pack with different programs. But the task is to write a program for linux to eliminate the problems of the programs that I use to view the help file.
1. I don't like the serif font.
2. Does not remember the zoom.
3. The side mouse button does not work to return to the previous page.
These problems must be solved.
viewtopic.php?t=68549
1. I don't like the serif font.
2. Does not remember the zoom.
3. The side mouse button does not work to return to the previous page.
These problems must be solved.
viewtopic.php?t=68549
Re: how to view a CHM file?
I know you are one of the best programmers here! And a question like "is it possible to write a progam in PB" is not a question on your level of experience. So, my conclusion was: you have problems with unpacking the CHM file because you described that is not possible with 7zip.You can make sure that I can unpack and pack with different programs. But the task is to write a program for linux to eliminate the problems of the programs that I use to view the help file.
1. I don't like the serif font.
2. Does not remember the zoom.
3. The side mouse button does not work to return to the previous page.
These problems must be solved.
I'm sure it makes no sense to describe you how to serach for a font section in a html file and change the font name.
I think the 1st problem is: unpack .chm files on linux with PB Code
- but for unpacking .chm on Linux, I remember there are some open source projects. At the moment I don't remeber the names.
Can you describe your problem more detailed?
Re: how to view a CHM file?
unable to open file in this forum post. This says that the LZX inside CHM is different from the LZX in the forum post. The CHM file opens in 7zip. Opening CHM in the 7zip program, I saw the archive type LZX.
I would not like to unpack files with an external program using the command line. I would like to use the UnLZX module to extract one file, find local href=file links in it to extract the missing css, png files into a temporary folder and open the html file in the web gadget. When closing the program, delete the unpacked files.
screenshot
Re: how to view a CHM file?
7zip can decompile CHM files.
Re: how to view a CHM file?
Your answers indicate that you do not understand what I write. I've been using 7zip to extract CHM for over 15 years.
Did you really see something different from this phrase? I already tell it like it is for children, repeating the words more than 2 times so that it is not possible to interpret it in two ways, but you still see the opposite.AZJIO wrote: Mon Dec 30, 2024 4:54 pm The CHM file opens in 7zip. Opening CHM in the 7zip program, I saw the archive type LZX.
Re: how to view a CHM file?
Right then,AZJIO wrote: Mon Dec 30, 2024 7:34 pm Your answers indicate that you do not understand what I write. I've been using 7zip to extract CHM for over 15 years.
https://github.com/Bioruebe/UniExtract2
https://github.com/Bioruebe/UniExtract2 ... actRC3.zip
The sources are available in AutoIt, one of your fav languages.
I am sure you can play around with it, and build the tool as per need.

Re: how to view a CHM file?
A few words about the subject on stackoverflow.com:
https://stackoverflow.com/questions/692 ... -view-them
https://stackoverflow.com/questions/692 ... -view-them
Re: how to view a CHM file?
When the author wrote this program, he actively communicated with me. This program does not contain unpacking modules; more precisely, it contains executable files and uses them to unpack archives. I initially wanted to have a built-in module. But it looks like I'll have to add a dependency to the 7zip binary in order to extract the files using it.
Yes, I'm aware of this help information, although I just need the unpacking module, and I'm not interested in the binaries inside CHM.
Re: how to view a CHM file?
7zxa.dll - library for extracting from 7z archives:
https://sourceforge.net/projects/sevenz ... z/download
https://sourceforge.net/projects/sevenz ... z/download
Re: how to view a CHM file?
From Apache tika:
;The Header
;0000: char[4] 'ITSF'
;0004: DWORD 3 (Version number)
;0008: DWORD Total header length, including header section table And following Data.
;000C: DWORD 1 (unknown)
;0010: DWORD a timestamp
;0014: DWORD Windows Language ID
;0018: GUID {7C01FD10-7BAA-11D0-9E0C-00A0-C922-E6EC}
;0028: GUID {7C01FD11-7BAA-11D0-9E0C-00A0-C922-E6EC} Note: a GUID is $10 bytes, arranged As 1 DWORD, 2 WORDs, And 8 BYTEs.
;0000: QWORD Offset of section from beginning of file
;0008: QWORD Length of section Following the header section table is 8 bytes of additional header Data. In Version 2 files, this Data is Not there And the content section starts immediately after the directory.
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1
;Directory header The directory starts With a header; its format is as follows:
;0000: char[4] 'ITSP'
;0004: DWORD Version number 1
;0008: DWORD Length of the directory header
;000C: DWORD $0a (unknown)
;0010: DWORD $1000 Directory chunk size
;0014: DWORD "Density" of quickref section, usually 2
;0018: DWORD Depth of the index tree - 1 there is no index, 2 If there is one level of PMGI chunks
;001C: DWORD Chunk number of root index chunk, -1 If there is none (though at least one file has 0 despite there being no index chunk, probably a bug)
;0020: DWORD Chunk number of first PMGL (listing) chunk
;0024: DWORD Chunk number of last PMGL (listing) chunk
;0028: DWORD -1 (unknown)
;002C: DWORD Number of directory chunks (total)
;0030: DWORD Windows language ID
;0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC}
;0044: DWORD $54 (This is the length again)
;0048: DWORD -1 (unknown)
;004C: DWORD -1 (unknown)
;0050: DWORD -1 (unknown)
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1
;Description There are two types of directory chunks -- index chunks, And listing chunks.
;The index chunk will be omitted If there is only one listing chunk.
;A listing chunk has the following format:
;0000: char[4] 'PMGL' 0004: DWORD Length of free space And/Or quickref area at End of directory chunk
;0008: DWORD Always 0
;000C: DWORD Chunk number of previous listing chunk when reading directory in sequence (-1 If this is the first listing chunk)
;0010: DWORD Chunk number of Next listing chunk when reading directory in sequence (-1 If this is the last listing chunk)
;0014: Directory listing entries (To quickref area) Sorted by filename; the sort is case-insensitive The quickref area is written backwards from the end of the chunk. One quickref entry exists for every n entries in the file, where n is calculated as 1 + (1 << quickref density). So for density = 2, n = 5 Chunklen-0002: WORD Number of entries in the chunk Chunklen-0004: WORD Offset of entry n from entry 0 Chunklen-0008: WORD Offset of entry 2n from entry 0 Chunklen-000C: WORD Offset of entry 3n from entry 0 ... The format of a directory listing entry is as follows BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT: length The offset is from the beginning of the content section the file is in, after the section has been decompressed (if appropriate). The length also refers to length of the file in the section after decompression. There are two kinds of file represented in the directory: user data and format related files. The files which are format-related have names which begin with '::', the user data files have names which begin with "/".
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?show-translation-form=1
;Description Note: Not always exists An index chunk has the following format:
;0000: char[4] 'PMGI'
;0004: DWORD Length of quickref/free area at End of directory chunk
;0008: Directory index entries (To quickref/free area) The quickref area in an PMGI is the same As in an PMGL The format of a directory index entry is As follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: directory listing chunk which starts With name Encoded Integers aka ENCINT An ENCINT is a variable-length integer. The high bit of each byte indicates "continued to the next byte". Bytes are stored most significant To least significant. So, For example, $EA $15 is (((0xEA&0x7F)<<7)|0x15) = 0x3515.
;Note: This class is Not in use http://translated.by/you/microsoft-s-ht ... ion-form=1
;DataSpace/Storage//ControlData This file contains $20 bytes of information on the compression.
;The information is partially known:
;0000: DWORD 6 (unknown)
;0004: ASCII 'LZXC' Compression type identifier
;0008: DWORD 2 (Possibly numeric code For LZX)
;000C: DWORD The Huffman reset interval in $8000-byte blocks
;0010: DWORD The window size in $8000-byte blocks
;0014: DWORD unknown (sometimes 2, sometimes 1, sometimes 0)
;0018: DWORD 0 (unknown)
;001C: DWORD 0 (unknown)
;//translated.by/you/microsoft-s-html-help-chm-format-incomplete/original /?page=2
Re: how to view a CHM file?
Up to now (with PureBasic.chm)
Code: Select all
;
; http://www.russotto.net/chm/chmformat.html
;
EnableExplicit
Structure ITSF_Header_Structure
Magic.a[4]
Version.l
TotalLength.l
Unknown1.l
Timestamp.l
WindowsLanguageID.l
GUID1.GUID
GUID2.GUID
OffsetOfSection.q
LengthOfSection.q
EndStructure
Structure ITSP_Header_Structure
Magic.a[4]
Version.l
Length.l
Unknown1.l
DirChunkSize.l
Density.l
DepthsIndexTree.l
ChunkNumberOfRootIndexChunk.l
ChunkNumberOfFirstPMGL.l
ChunkNumberOfLastPMGL.l
Unknown2.l
NumberOfDirectoryChunks.l
WindowsLanguageID.l
GUID1.GUID
Length2.l
Unknown3.l
Unknown4.l
Unknown5.l
EndStructure
Structure PMGL_Header_Structure
Magic.a[4]
QuickRefLength.l
Unknown1.l
ChunkNumberOfPreviousListingChunk.l
ChunkNumberOfNextListingChunk.l
EndStructure
Structure PMGI_Header_Structure
Magic.a[4]
QuickRefLength.l
EndStructure
Structure NameList_Structure
Length.w
NumberOfEntries.w
EndStructure
Structure ControlData_Structure
NumberOfDWordsFollowingMagic.l
Magic.a[4]
Version.l
LZXResetIntrval.l
WindowSize.l
CacheSize.l
Unknown1.l
EndStructure
Structure SpanInfo_Structure
UncompressedLength.q
EndStructure
Structure ResetTable_Structure
Unkown1.l
NumberOfEntries.l
SizeOfTableEntry.l
TableHeaderLength.l
UncompressedLength.q
CompressedLength.q
BlockSize.q
EntryNumber.q
LocationFirstBlockBoundaryInUncompressedData.q
EndStructure
Procedure.i EncInt(*Ptr.Ascii, *Value.Quad)
Protected Bytes.i, Result.q
While *Ptr\a > $7F
Bytes + 1
Result << 7
Result | (*Ptr\a & $7F)
*Ptr + 1
Wend
Bytes + 1
Result << 7
Result | *Ptr\a
*Value\q = Result
ProcedureReturn Bytes
EndProcedure
Define.i File, NumberOfChunkEntries, i, Entries
Define.q QuadValue, Section, Offset, Length
Define Filename$, Name$
Define *FileBuffer, *Out, *Section0, *Content
Define *ITSF_Header.ITSF_Header_Structure
Define *ITSP_Header.ITSP_Header_Structure
Define *PMGL_Header.PMGL_Header_Structure
Define *PMGI_Header.PMGI_Header_Structure
Define *QuickRef, *Name, *Entry
Define *NameList.NameList_Structure
Define *ControlData.ControlData_Structure
Define *SpanInfo.SpanInfo_Structure
Define *ResetTable.ResetTable_Structure
Define.q ContentOffset, ContentLength, ControlDataOffset, ControlDataLength, SpanInfoOffset, SpanInfoLength, ResetTableOffset, ResetTableLength
UseLZMAPacker()
Filename$ = OpenFileRequester("Open a CHM file", "", "CHM|*.chm", 0)
If Filename$
File = ReadFile(#PB_Any, Filename$)
If File
*FileBuffer = AllocateMemory(Lof(File), #PB_Memory_NoClear)
If *FileBuffer
If ReadData(File, *FileBuffer, MemorySize(*FileBuffer)) = MemorySize(*FileBuffer)
*ITSF_Header = *FileBuffer
Debug *ITSF_Header\Magic
If *ITSF_Header\Magic[0] = 'I' And *ITSF_Header\Magic[1] = 'T' And *ITSF_Header\Magic[2] = 'S' And *ITSF_Header\Magic[3] = 'F'
Debug "ITSF"
Debug "Version: " + Str(*ITSF_Header\Version)
Debug "TotalLength: " + Str(*ITSF_Header\TotalLength)
Debug "Timestamp: " + FormatDate("%yyyy-%mm-%dd", *ITSF_Header\Timestamp)
Debug "LanguageID: " + Str(*ITSF_Header\WindowsLanguageID)
*ITSP_Header = *FileBuffer + *ITSF_Header\OffsetOfSection + *ITSF_Header\LengthOfSection
If *ITSP_Header\Magic[0] = 'I' And *ITSP_Header\Magic[1] = 'T' And *ITSP_Header\Magic[2] = 'S' And *ITSP_Header\Magic[3] = 'P'
Debug "ITSP"
Debug "DirChunkSize: " + Str(*ITSP_Header\DirChunkSize)
Debug "Density: " + Str(*ITSP_Header\Density)
Debug "ChunkNumberOfFirstPMGL: " + Str(*ITSP_Header\ChunkNumberOfFirstPMGL)
Debug "ChunkNumberOfLastPMGL: " + Str(*ITSP_Header\ChunkNumberOfLastPMGL)
*PMGL_Header = *ITSP_Header + *ITSP_Header\Length
*PMGI_Header = *PMGL_Header
If *PMGL_Header\Magic[0] = 'P' And *PMGL_Header\Magic[1] = 'M' And *PMGL_Header\Magic[2] = 'G' And *PMGL_Header\Magic[3] = 'L'
For Entries = *ITSP_Header\ChunkNumberOfFirstPMGL To *ITSP_Header\ChunkNumberOfLastPMGL
If *PMGL_Header\Magic[0] = 'P' And *PMGL_Header\Magic[1] = 'M' And *PMGL_Header\Magic[2] = 'G' And *PMGL_Header\Magic[3] = 'L'
Debug "PMGL"
Debug "PMGL QuickRefLength: " + Str(*PMGL_Header\QuickRefLength)
*QuickRef = *PMGL_Header + *ITSP_Header\DirChunkSize
NumberOfChunkEntries = PeekW(*QuickRef - 2)
Debug "NuberOfChunkEntries: " + Str(NumberOfChunkEntries)
*Entry = *PMGL_Header + SizeOf(PMGL_Header_Structure)
For i = 1 To NumberOfChunkEntries
*Entry + EncInt(*Entry, @QuadValue)
Name$ = PeekS(*Entry, QuadValue, #PB_UTF8|#PB_ByteLength)
Debug Name$
*Entry + QuadValue
*Entry + EncInt(*Entry, @QuadValue)
Section = QuadValue
Debug "content section: " + Str(Section)
*Entry + EncInt(*Entry, @QuadValue)
Offset = QuadValue
Debug "offset: " + Str(Offset)
*Entry + EncInt(*Entry, @QuadValue)
Length = QuadValue
Debug "length: " + Str(Length)
If Section = 0 And Left(Name$, 20) = "::DataSpace/Storage/"
If FindString(Name$, "/Content", 20)
ContentOffset = Offset
ContentLength = Length
EndIf
If FindString(Name$, "/ControlData", 20)
ControlDataOffset = Offset
ControlDataLength = Length
EndIf
If FindString(Name$, "/SpanInfo", 20)
SpanInfoOffset = Offset
SpanInfoLength = Length
EndIf
If FindString(Name$, "/ResetTable", 20)
ResetTableOffset = Offset
ResetTableLength = Length
EndIf
EndIf
Next i
EndIf
*PMGL_Header + *ITSP_Header\DirChunkSize
Next
;Until *PMGL_Header\ChunkNumberOfNextListingChunk = -1
*PMGI_Header = *PMGL_Header + *ITSP_Header\DirChunkSize
*NameList = *PMGI_Header
EndIf
If *ITSP_Header\DepthsIndexTree > 1
If *PMGI_Header\Magic[0] = 'P' And *PMGI_Header\Magic[1] = 'M' And *PMGI_Header\Magic[2] = 'G' And *PMGI_Header\Magic[3] = 'I'
Debug "PMGI QuickRefLength: " + Str(*PMGL_Header\QuickRefLength)
*QuickRef = *PMGI_Header + *ITSP_Header\DirChunkSize
NumberOfChunkEntries = PeekW(*QuickRef - 2)
Debug "NuberOfChunkEntries: " + Str(NumberOfChunkEntries)
*Entry = *PMGI_Header + SizeOf(PMGI_Header_Structure)
For i = 1 To NumberOfChunkEntries
*Entry + EncInt(*Entry, @QuadValue)
Debug PeekS(*Entry, QuadValue, #PB_UTF8|#PB_ByteLength)
*Entry + QuadValue
*Entry + EncInt(*Entry, @QuadValue)
Debug "Starts with name: " + Str(QuadValue)
Next i
*NameList = *PMGI_Header + *ITSP_Header\DirChunkSize
EndIf
EndIf
*Section0 = *NameList
*Content = *Section0 + ContentOffset
*ControlData = *Section0 + ControlDataOffset
*SpanInfo = *Section0 + SpanInfoOffset
*ResetTable = *Section0 + ResetTableOffset
If *NameList\Length > 0 And *NameList\NumberOfEntries = 2
*Name = *NameList + SizeOf(NameList_Structure)
For i = 1 To *NameList\NumberOfEntries
*Name + 2
Select PeekS(*Name)
Case "Uncompressed"
Debug "Uncompressed"
*Name + (PeekW(*Name - 2) + 1) * 2
Case "MSCompressed"
Debug "MSCompressed"
*Name + (PeekW(*Name - 2) + 1) * 2
EndSelect
Next i
; If *LZXC_Header\Magic[0] = 'L' And *LZXC_Header\Magic[1] = 'Z' And *LZXC_Header\Magic[2] = 'X' And *LZXC_Header\Magic[3] = 'C'
; Debug "LZXC"
; EndIf
EndIf
EndIf
If PeekS(@*ControlData\Magic[0], 4, #PB_Ascii) = "LZXC"
Debug "SpanInfo\UncompressedLength: " + Str(*SpanInfo\UncompressedLength)
Debug "ResetTable\BlockSize: " + Str(*ResetTable\BlockSize)
Debug "ResetTable\CompressedLength: " + Str(*ResetTable\CompressedLength)
Debug "ResetTable\UncompressedLength: " + Str(*ResetTable\UncompressedLength)
ShowMemoryViewer(*Content, 100)
; LZX decoding !!!
; http://www.jedrea.com/chmlib/
*Out = AllocateMemory($40000, #PB_Memory_NoClear)
i = 1
Repeat
Length = UncompressMemory(*Content, $FFFF, *Out, MemorySize(*Out), #PB_PackerPlugin_Lzma)
If Length > 0
Debug "i: " + Str(i)
ShowMemoryViewer(*Out, 1000)
EndIf
i + 1
Until i = $8001
EndIf
EndIf
EndIf
FreeMemory(*FileBuffer)
EndIf
CloseFile(File)
EndIf
EndIf
Last edited by infratec on Thu Jan 02, 2025 7:10 pm, edited 1 time in total.