Page 1 of 1

Reading TIFF tags

Posted: Sun Mar 04, 2007 7:30 am
by r_hyde
I recently needed to get some info from a large number of TIFF images, but didn't want to have to load the pixel data just to get it (color depth, width, height, etc.). The obvious solution was to read the tag data directly, so I dug out my copy of the TIFF 6 filespec and got comfy...

Anyway, I wanted to share what I learned, so I wrote and commented a function to read all tags in a TIFF file and return them as formatted text. It reads the tags for all pages of multi-page TIFFs, and handles endian conversion. I could not fully test endian conversion, though, so I'd be happy to hear about any unusual results on platforms other than Windows.

NOTE: This code is free for any use at all, and is not released under any license in particular. I don't care if you copy it verbatim and pretend that you wrote it :P

Code: Select all

#LITTLE_ENDIAN = $4949
#BIG_ENDIAN = $4D4D

Macro SwapW(val)
  (((val >> 8) & $00ff) | ((val << 8) & $ff00))
EndMacro

Macro SwapL(val)
  (((val>>24) & $000000ff)|((val>>8) & $0000ff00)|((val<<8) & $00ff0000)|((val<<24) & $ff000000))
EndMacro

Procedure.s GetTIFFTags(filename$)
  Protected result$, tiffile.l, ifd.l, tag.w, type.w
  Protected nval.l, offset.l, fpos.l, byteorder.l
  tiffile = ReadFile(#PB_Any, filename$)
  
  byteorder = ReadWord(tiffile) ;check the endianness of the file
  
  ;prepare to handle reverse endianness depending on platform
  SwapByteOrder.b = #False
  CompilerSelect #PB_Compiler_OS
    CompilerCase #PB_OS_AmigaOS
      If byteorder = #LITTLE_ENDIAN : SwapByteOrder = #True : EndIf
    CompilerCase #PB_OS_Linux
      If byteorder = #LITTLE_ENDIAN : SwapByteOrder = #True : EndIf
    CompilerCase #PB_OS_MacOS
      If byteorder = #LITTLE_ENDIAN : SwapByteOrder = #True : EndIf
    CompilerCase #PB_OS_Windows
      If byteorder = #BIG_ENDIAN : SwapByteOrder = #True : EndIf
  CompilerEndSelect
  
  ;position of 1st IFD is stored at offset 4 bytes
  FileSeek(tiffile, 4)
  ifd = ReadLong(tiffile)
  If SwapByteOrder : ifd = SwapL(ifd) : EndIf
  
  ;we will support multi-page TIFF by looping through each IFD found in the file
  While ifd
    result$ = result$ + "TIFF directory at offset " + Str(ifd) + ":" + #CRLF$
  
    ;go to IFD
    FileSeek(tiffile, ifd)
  
    ;1st 2 bytes of IFD indicate the # of tags stored
    numtags = ReadWord(tiffile)
    If SwapByteOrder : numtags = SwapW(numtags) : EndIf
    
    ;Enumerating the tags. The tag structure is simple:
    ;   tag number:  2 bytes; unique identifier for each TIFF tag
    ;   tag type:    2 bytes; indicates the type of value stored in the tag
    ;   num values:  4 bytes; how many values of type 'tag type' are stored
    ;   data offset: 4 bytes; offset (starting from 0) where the tag data is located.
    ;                         If the data is 4 bytes or less, the value is stored
    ;                         directly here instead of elsewhere to save time/space.
    For t = 1 To numtags
      tag = ReadWord(tiffile)
      If SwapByteOrder : tag = SwapW(tag) : EndIf
      result$ = result$ + "   TAG " + Str(tag) + #TAB$
      
      type = ReadWord(tiffile)
      If SwapByteOrder : type = SwapW(type) : EndIf
      Select type
        Case 1 : result$ = result$ + "BYTE" + #TAB$
        Case 2 : result$ = result$ + "CHAR" + #TAB$
        Case 3 : result$ = result$ + "WORD" + #TAB$
        Case 4 : result$ = result$ + "LONG" + #TAB$
        Case 5 : result$ = result$ + "DOUBLE" + #TAB$
      EndSelect
      
      nval = ReadLong(tiffile)
      If SwapByteOrder : nval = SwapL(nval) : EndIf
            
      offset = ReadLong(tiffile) ;do not swap yet if different endianness - we will do it later

      result$ = result$ + "value:  "
      
      fpos = Loc(tiffile) ;we will be visiting other offsets - we must store the current
                          ;location so we can return here before moving on to the next tag.
      Select type
        Case 1  ;tag type is byte
          If SwapByteOrder : offset = SwapL(offset) : EndIf
          If nval <= 4
            ;value takes less than 4 bytes, so it is stored directly in offset.
            ;we must unpack each used byte as binary
            For n = 0 To nval-1
              result$ = result$ + Str(PeekB(@offset+n)) + " "
            Next
          Else
            ;value consumes more than 4 bytes; it is stored elsewhere
            FileSeek(tiffile, offset)
            For n = 0 To nval-1
              result$ = result$ + Str(ReadByte(tiffile)) + " "
            Next
            FileSeek(tiffile, fpos)
          EndIf
        Case 2  ;type is char (1 byte)
          If SwapByteOrder : offset = SwapL(offset) : EndIf
          If nval <= 4
            ;value is stored in offset.
            ;unpack each used byte as char
            For n = 0 To nval-1
              result$ = result$ + Chr(PeekC(@offset+n))
            Next
          Else
            ;value consumes more than 4 bytes; it is stored elsewhere
            FileSeek(tiffile, offset)
            For n = 0 To nval-1
              result$ = result$ + Chr(ReadCharacter(tiffile))
            Next
            FileSeek(tiffile, fpos)
          EndIf
        Case 3  ;type is word (2 bytes)
          If nval <= 2
            ;value is stored in offset; unpack each used word
            For n = 0 To nval-1 Step 2
              valw.w = PeekW(@offset+n)
              If SwapByteOrder : valw = SwapW(valw) : EndIf
              result$ = result$ + Str(valw) + " "
            Next
          Else
            If SwapByteOrder : offset = SwapL(offset) : EndIf
            FileSeek(tiffile, offset)
            For n = 0 To nval-1 Step 2
              valw.w = ReadWord(tiffile)
              If SwapByteOrder : valw = SwapW(valw) : EndIf
              result$ = result$ + Str(val) + " "
            Next
            FileSeek(tiffile, fpos)
          EndIf
        Case 4  ;type is long (4 bytes)
          If SwapByteOrder : offset = SwapL(offset) : EndIf
          If nval = 1
            ;value can be taken directly from offset
            result$ = result$ + Str(offset)
          Else
            FileSeek(tiffile, offset)
            For n = 0 To nval-1
              vall.l = ReadLong(tiffile)
              If SwapByteOrder : vall = SwapL(vall) : EndIf
              result$ = result$ + Str(val) + " "
            Next
            FileSeek(tiffile, fpos)
          EndIf
        Case 5 ;type is double (8 bytes)
          ;it is always more than 4 bytes, so seek to offset & read
          If SwapByteOrder : offset = SwapL(offset) : EndIf
          FileSeek(tiffile, offset)
          For n = 0 To nval-1
            ;we will read the value in as two longs - the first is the
            ;numerator and the second is the denominator.  the final result
            ;is the division of the numerator by the denominator; a float.
            numer = ReadLong(tiffile)
            If SwapByteOrder : numer = SwapL(numer) : EndIf
            denom = ReadLong(tiffile)
            If SwapByteOrder : denom = SwapL(denom) : EndIf
            result$ = result$ + StrF(numer / denom, 2)
          Next
          FileSeek(tiffile, fpos)
      EndSelect
      result$ + #CRLF$
    Next
    result$ + #CRLF$  ;add some whitespace between IFDs for beauty;)
    ifd = ReadLong(tiffile) ;0 if there are no more IFDs, else offset to next IFD
    If SwapByteOrder : ifd = SwapL(ifd) : EndIf
  Wend
  CloseFile(tiffile)
  ProcedureReturn result$
EndProcedure

Posted: Mon Mar 05, 2007 4:46 pm
by Clutch
I will have use for this. Thank you for sharing. :)