Packing bytes: a little help from my friends

wilbert · Post by **wilbert** » Fri Nov 15, 2013 5:01 pm

After some thinking, maybe it's wiser to look first for speed enhancements that don't require asm.
I think there's quite some speed to gain just optimizing the basic code and that keeps everything working even on x64.

Post by **idle** » Fri Nov 15, 2013 8:18 pm

netmaestro wrote: still looking for Linux.

Works on Linux!

wilbert · Post by **wilbert** » Sat Nov 16, 2013 7:14 am

Here's my attempt to improve the speed a bit without using asm.

Code: Select all

;========================================================================================
;{                        NEUQUANT COLORTABLE GENERATION SECTION
;========================================================================================
;
; NeuQuant Neural-Net Quantization Algorithm 
; Copyright (c) 1994 Anthony Dekker
;
; NEUQUANT Neural-Net quantization algorithm by Anthony Dekker, 1994.
; See "Kohonen neural networks for optimal colour quantization"
; in "Network: Computation in Neural Systems" Vol. 5 (1994) pp 351-367.
; for a discussion of the algorithm.
; See also  http://members.ozemail.com.au/~dekker/NEUQUANT.HTML
;
; Any party obtaining a copy of these files from the author, directly or
; indirectly, is granted, free of charge, a full and unrestricted irrevocable,
; world-wide, paid up, royalty-free, nonexclusive right and license to deal
; in this software and documentation files (the "Software"), including without
; limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
; and/or sell copies of the Software, and to permit persons who receive
; copies from any such party to do so, with the only requirement being
; that this copyright notice remain intact.
;
; Ported to PureBasic 4.31-4.40 by Luis on Oct 2009 
; http://luis.no-ip.net
; The code in this include *should* be cross / platform. Tested on Windows 32/64 bit.
; Pfew! A lot or pointers' arithmetic to convert !
;
; Pseudo code usage example for PB:
;
; [ obtain a pointer to the color data bitmap in memory ]
;
; // please note the color data must be a sequence of BGRA quadruplets (32 bit/pixel) and that the A channel is ignored
; // allocated space would be = (4 * ImageWidth * ImageHeight) 
;
; NQ_InitNetwork() ; initialize network
; NQ_Learn (*ImageBuf, MemorySize(*ImageBuf), [1..30]) ; learning 
; NQ_UnbiasNetwork() ; network unbias
;
; // at this point the optimized palette has been calculated and it's in g_NQ_Network() 
; // g_NQ_Network(0..255)\iBGRC[0] ; BLUE 
; // g_NQ_Network(0..255)\iBGRC[1] ; GREEN 
; // g_NQ_Network(0..255)\iBGRC[2] ; RED 
; 
; NQ_SortNetwork() ; optimize the palette order for the lookup phase
;
; // now you loop through every pixel in the original 24/32 bit image and retrieve the palette index to be used in the 8 bit image 
; 
; [read B, G, R data from source image ]
; iIndex = NQ_LookupPalette(B, G, R) 
; [ write iIndex to the destination image ]

CompilerIf Defined(MAXLONG, #PB_Constant)=0
  #MAXLONG = $7FFFFFFF
CompilerEndIf

CompilerIf Defined(RGBQUAD, #PB_Structure)=0
  Structure RGBQUAD
    rgbBlue.b
    rgbGreen.b
    rgbRed.b
    rgbReserved.b
  EndStructure
CompilerEndIf

Structure T_RGBTRIPLE 
  rgbtBlue.a
  rgbtGreen.a
  rgbtRed.a
EndStructure

Structure T_PIXEL
  iBGRC.i[4]
EndStructure

Declare.i  _NQ_Contest (b, g, r)
Declare    _NQ_AlterSingle (alpha, i, b, g, r)
Declare    _NQ_AlterNeigh (rad, i, b, g, r)

Declare    NQ_InitNetwork ()
Declare    NQ_UnbiasNetwork()
Declare    NQ_SortNetwork()
Declare.i  NQ_LookupPalette (b,g,r)
Declare.i  NQ_Learn (*ImageBuffer.T_RGBTRIPLE, iBufferLen, iSample)

#NQ_netsize = 256  ; number of colours used

; four primes near 500 - assume no image has a length so large
; that it is divisible by all four primes 
#NQ_prime1  = 499
#NQ_prime2  = 491
#NQ_prime3  = 487
#NQ_prime4  = 503

#NQ_minpicturebytes     = 3 * #NQ_prime4 ; minimum size for input image 

; Network Definitions   
#NQ_maxnetpos           = #NQ_netsize - 1
#NQ_netbiasshift        = 4     ; bias for colour values 
#NQ_ncycles             = 100 ; no. of learning cycles 

; Defs for freq and bias
#NQ_intbiasshift        = 16 ; bias for fractions 
#NQ_intbias             = 1 << #NQ_intbiasshift
#NQ_gammashift          = 10 ; gamma = 1024 
#NQ_gamma               = 1 << #NQ_gammashift
#NQ_betashift           = 10
#NQ_beta                = #NQ_intbias >> #NQ_betashift ; beta = 1/1024 
#NQ_betagamma           = #NQ_intbias << (#NQ_gammashift - #NQ_betashift)

; defs for decreasing radius factor
#NQ_initrad             = #NQ_netsize >> 3  ; for 256 cols, radius starts 
#NQ_radiusbiasshift     = 6 ; at 32.0 biased by 6 bits
#NQ_radiusbias          = 1 << #NQ_radiusbiasshift
#NQ_initradius          = #NQ_initrad * #NQ_radiusbias 
#NQ_radiusdec           = 15 ; and decreases by a factor of 1/#NQ_radiusdec each cycle (originally was 30)

; defs for decreasing alpha factor 
#NQ_alphabiasshift      = 10 ; alpha starts at 1.0 
#NQ_initalpha           = 1 << #NQ_alphabiasshift

; radbias and alpharadbias used for radpower calculation 
#NQ_radbiasshift        = 8
#NQ_radbias             = 1 << #NQ_radbiasshift
#NQ_alpharadbshift      = #NQ_alphabiasshift + #NQ_radbiasshift
#NQ_alpharadbias        = 1 << #NQ_alpharadbshift


Global Dim g_NQ_Network.T_PIXEL (#NQ_netsize - 1)  ; the network itself 
Global Dim g_NQ_NetIndex(256 - 1)                  ; for network lookup - really 256 
Global Dim g_NQ_Bias (#NQ_netsize - 1)             ; bias and freq arrays for learning 
Global Dim g_NQ_Freq (#NQ_netsize - 1)             ; bias and freq arrays for learning 
Global Dim g_NQ_RadPower (#NQ_initrad - 1)         ; radpower for precomputation 


Procedure.i _NQ_Contest (b, g, r)
  Protected i, dist, a, biasdist, betafreq
  Protected bestpos, bestbiaspos, bestd, bestbiasd
  Protected *p.Integer, *f.Integer, *n.T_PIXEL
  
  ; finds closest neuron (min dist) and updates freq 
  ; finds best neuron (min dist-bias) and returns position 
  ; for frequently chosen neurons, g_NQ_Freq(i) is high and g_NQ_Bias(i) is negative
  ; g_NQ_Bias(i) = gamma*((1/#NQ_netsize)-g_NQ_Freq(i)) 
  
  bestd = #MAXLONG
  bestbiasd = bestd
  bestpos = -1
  bestbiaspos = bestpos
  *p = @g_NQ_Bias()
  *f = @g_NQ_Freq()
  
  For i = 0 To #NQ_netsize - 1
    *n = @g_NQ_Network(i)
    dist = *n\iBGRC[0] - b : If dist < 0 : dist = -dist : EndIf
    a = *n\iBGRC[1] - g :  If a < 0 : a = -a : EndIf
    dist + a
    a = *n\iBGRC[2] - r : If a < 0 : a = -a : EndIf  
    dist + a
    If dist < bestd : bestd = dist : bestpos = i : EndIf
    biasdist = dist - (*p\i >> (#NQ_intbiasshift - #NQ_netbiasshift))
    If biasdist < bestbiasd : bestbiasd = biasdist: bestbiaspos = i : EndIf
    betafreq = *f\i >> #NQ_betashift
    *f\i - betafreq : *f + SizeOf(Integer)
    *p\i + (betafreq << #NQ_gammashift) : *p + SizeOf(Integer)
  Next   
  
  g_NQ_Freq(bestpos) + #NQ_beta
  g_NQ_Bias(bestpos) - #NQ_betagamma
  
  ProcedureReturn bestbiaspos   
EndProcedure

; Move neuron i towards biased (b,g,r) by factor alpha
Procedure _NQ_AlterSingle (alpha, i, b, g, r)
  Protected *n.T_PIXEL = @g_NQ_Network(i) ; alter hit neuron 
  
  *n\iBGRC[0] - (alpha * (*n\iBGRC[0] - b)) / #NQ_initalpha 
  *n\iBGRC[1] - (alpha * (*n\iBGRC[1] - g)) / #NQ_initalpha
  *n\iBGRC[2] - (alpha * (*n\iBGRC[2] - r)) / #NQ_initalpha
EndProcedure

; Move adjacent neurons by precomputed alpha*(1-((i-j)^2/[r]^2)) in g_NQ_RadPower[|i-j|]
Procedure _NQ_AlterNeigh (rad, i, b, g, r)
  Protected j, k, lo, hi, a
  Protected *p.T_PIXEL, *q.Integer 
  
  lo = i - rad : If lo < -1 : lo = -1 : EndIf
  hi = i + rad : If hi > #NQ_netsize : hi = #NQ_netsize : EndIf
  
  j = i + 1 
  k = i - 1
  *q = @g_NQ_RadPower()
  
  While (j < hi) Or (k > lo) 
    *q + SizeOf(Integer) : a = *q\i
    
    If j < hi
      *p = @g_NQ_Network(j)
      *p\iBGRC[0] - (a * (*p\iBGRC[0] - b)) / #NQ_alpharadbias
      *p\iBGRC[1] - (a * (*p\iBGRC[1] - g)) / #NQ_alpharadbias
      *p\iBGRC[2] - (a * (*p\iBGRC[2] - r)) / #NQ_alpharadbias
      j + 1
    EndIf
    If k > lo
      *p = @g_NQ_Network(k)
      *p\iBGRC[0] - (a * (*p\iBGRC[0] - b)) / #NQ_alpharadbias
      *p\iBGRC[1] - (a * (*p\iBGRC[1] - g)) / #NQ_alpharadbias
      *p\iBGRC[2] - (a * (*p\iBGRC[2] - r)) / #NQ_alpharadbias
      k - 1        
    EndIf
  Wend
EndProcedure

; Initialise network in range (0,0,0) To (255,255,255) and set parameters
Procedure NQ_InitNetwork ()   
  Protected i, *p.T_PIXEL
  
  For i = 0 To #NQ_netsize - 1
    *p = @g_NQ_Network(i)
    *p\iBGRC[0] = (i << (#NQ_netbiasshift + 8)) / #NQ_netsize
    *p\iBGRC[1] = *p\iBGRC[0]
    *p\iBGRC[2] = *p\iBGRC[0]
    g_NQ_Freq(i) = #NQ_intbias/#NQ_netsize ; 1/netsize 
    g_NQ_Bias(i) = 0
  Next
EndProcedure   

; Unbias network to give byte values 0..255 and record position i to prepare for sort
Procedure NQ_UnbiasNetwork()
  Protected i, j, t
  
  For i = 0 To #NQ_netsize - 1
    For j = 0 To 2
      t = (g_NQ_Network(i)\iBGRC[j] + (1 << (#NQ_netbiasshift - 1))) >> #NQ_netbiasshift
      If (t > 255) : t = 255 : EndIf
      g_NQ_Network(i)\iBGRC[j] = t
    Next
    g_NQ_Network(i)\iBGRC[3]  = i ; record colour no
  Next
EndProcedure

; Insertion sort of network and building of g_NQ_NetIndex(0..255) (to do after unbias)
Procedure NQ_SortNetwork()
  
  Protected i, j, smallpos, smallval 
  Protected *p.T_PIXEL, *q.T_PIXEL
  Protected previouscol, startpos
  
  previouscol = 0
  startpos = 0
  
  For i = 0 To #NQ_netsize - 1
    
    *p = @g_NQ_Network(i)
    
    smallpos = i   
    smallval = *p\iBGRC[1]  ; index on g 
    
    ; find smallest in i..#NQ_netsize - 1 
    For j = i+1 To #NQ_netsize - 1
      *q = @g_NQ_Network(j)
      If *q\iBGRC[1] < smallval ; index on g 
        smallpos = j
        smallval = *q\iBGRC[1] ; index on g
      EndIf
    Next
    
    *q = @g_NQ_Network(smallpos)
    
    ; swap p (i) and q (smallpos) entries 
    If i <> smallpos       
      Swap *q\iBGRC[0], *p\iBGRC[0]
      Swap *q\iBGRC[1], *p\iBGRC[1]
      Swap *q\iBGRC[2], *p\iBGRC[2]
      Swap *q\iBGRC[3], *p\iBGRC[3]      
    EndIf
    
    ; smallval entry is now in position i 
    If smallval <> previouscol
      g_NQ_NetIndex(previouscol) = (startpos+i) >> 1
      
      For j = previouscol + 1 To smallval - 1
        g_NQ_NetIndex(j) = i
      Next
      
      previouscol = smallval        
      startpos = i        
    EndIf        
  Next
  
  g_NQ_NetIndex(previouscol) = (startpos + #NQ_maxnetpos) >> 1
  
  For j = previouscol + 1 To 255
    g_NQ_NetIndex(j) = #NQ_maxnetpos ; really 256
  Next
  
EndProcedure

; Search for BGR values 0..255 (after net is unbiased) and return colour index
Procedure.i NQ_LookupPalette (b, g, r)
  Protected i, j, dist, a, best, bestd
  Protected *p.T_PIXEL
  
  bestd = 1000 ; biggest possible dist is 256*3
  best = -1
  
  i = g_NQ_NetIndex(g) ; index on g 
  j = i - 1            ; start at g_NQ_NetIndex(g) and work outwards 
  
  While i < #NQ_netsize Or j >= 0
    
    If i < #NQ_netsize    
      *p = @g_NQ_Network(i)
      
      dist = *p\iBGRC[1] - g ; inx key 
      
      If dist >= bestd
        i = #NQ_netsize ; stop iter
      Else
        i + 1
        If dist < 0: dist = -dist : EndIf
        a = *p\iBGRC[0] - b : If a < 0 : a = -a : EndIf
        dist + a
        If dist < bestd
          a = *p\iBGRC[2] - r : If a < 0 : a = -a : EndIf
          dist + a
          If dist < bestd : bestd = dist : best = *p\iBGRC[3] : EndIf
        EndIf    
      EndIf
    EndIf
    
    If j >= 0    
      *p = @g_NQ_Network(j)
      
      dist = g - *p\iBGRC[1] ; inx key - reverse dif
      
      If dist >= bestd 
        j = -1 ; stop iter
      Else
        j - 1
        If dist < 0 : dist = -dist : EndIf
        a = *p\iBGRC[0] - b : If a < 0 : a = -a : EndIf
        dist + a
        If dist < bestd 
          a = *p\iBGRC[2] - r : If  a < 0 : a = -a : EndIf
          dist + a
          If dist < bestd : bestd = dist : best = *p\iBGRC[3] : EndIf
        EndIf
      EndIf
    EndIf
    
  Wend
  
  ProcedureReturn best
EndProcedure

; Main Learning Loop
Procedure.i NQ_Learn (*ImageBuffer.T_RGBTRIPLE, iBufferLen, iSample)
  
  Protected i, j, b, g, r
  Protected radius, rad, alpha, iStep, delta, samplepixels, alphadec 
  Protected *lim
  
  If iBufferLen < #NQ_minpicturebytes
    ProcedureReturn 0
  EndIf
  
  alphadec = 30 + ((iSample - 1) / 3)
  
  *lim = *ImageBuffer + iBufferLen
  samplepixels = iBufferLen / (4 * iSample)
  delta = samplepixels / #NQ_ncycles
  alpha = #NQ_initalpha
  radius = #NQ_initradius
  
  If samplepixels = 0 Or delta = 0 
    ProcedureReturn 0
  EndIf
  
  rad = radius >> #NQ_radiusbiasshift
  
  If rad <= 1 : rad = 0 : EndIf
  
  For i = 0 To rad - 1
    g_NQ_RadPower(i) = alpha * (((rad * rad - i * i) * #NQ_radbias) / (rad*rad))
  Next
  
  If ((iBufferLen % #NQ_prime1) <> 0) 
    iStep = 4 * #NQ_prime1 
  ElseIf ((iBufferLen % #NQ_prime2) <> 0) 
    iStep = 4 * #NQ_prime2
  ElseIf ((iBufferLen % #NQ_prime3) <> 0)
    iStep = 4 * #NQ_prime3
  Else 
    iStep = 4 * #NQ_prime4
  EndIf
  
  i = 0
  
  While (i < samplepixels) 
    b = (*ImageBuffer\rgbtBlue &$FF) << #NQ_netbiasshift
    g = (*ImageBuffer\rgbtGreen &$FF) << #NQ_netbiasshift
    r = (*ImageBuffer\rgbtRed &$FF) << #NQ_netbiasshift
    j = _NQ_Contest(b,g,r)
    
    _NQ_AlterSingle (alpha,j,b,g,r)
    
    If (rad) : _NQ_AlterNeigh(rad,j,b,g,r) : EndIf ; alter neighbours
    
    *ImageBuffer + iStep
    
    If *ImageBuffer >= *lim : *ImageBuffer - iBufferLen : EndIf
    
    i + 1
    
    If i % delta = 0
      alpha - (alpha / alphadec)
      radius - (radius / #NQ_radiusdec)
      rad = radius >> #NQ_radiusbiasshift
      
      If rad <= 1 : rad = 0 : EndIf
      
      For j = 0 To rad - 1
        g_NQ_RadPower(j) = alpha * (((rad * rad - j * j) * #NQ_radbias) / (rad*rad))
      Next        
    EndIf   
  Wend
  
  ProcedureReturn 1
  
EndProcedure

;========================================================================================
;{                           FLOYD-STEINBERG DITHERING RESIDENTS
;========================================================================================

; Copyright (c) 2009 the authors listed at the following URL, and/or
; the authors of referenced articles or incorporated external code:
; http://en.literateprograms.org/Floyd-Steinberg_dithering_(C)?action=history&offset=20080916082812
;
; Permission is hereby granted, free of charge, to any person obtaining
; a copy of this software and associated documentation files (the
; "Software"), to deal in the Software without restriction, including
; without limitation the rights to use, copy, modify, merge, publish,
; distribute, sublicense, and/or sell copies of the Software, and to
; permit persons to whom the Software is furnished to do so, subject to
; the following conditions:
;
; The above copyright notice and this permission notice shall be
; included in all copies or substantial portions of the Software.
;
; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
; EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
; MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
; IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
; CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
; TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
; SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
;
; Retrieved from: http://en.literateprograms.org/Floyd-Steinberg_dithering_(C)?oldid=14630
;
; Ported to PureBasic code by Lloyd Gallant (netmaestro) on October 1, 2009
;
Structure RGBQUAD_U                ; Unsigned char RGBQUAD
  rgbBlue.a
  rgbGreen.a
  rgbRed.a
  rgbReserved.a
EndStructure

Structure PALETTE                  ; Used to index the colortable
  index.RGBQUAD_U[256]
EndStructure

Structure COLORBITS_RGBQUAD        ; Used to index the source colorbits 
  pixels.RGBQUAD_U[0]
EndStructure

Structure COLORBITS_1BYTE_PERPIXEL ; Used to index the target colorbits
  pixels.a[0]
EndStructure

Macro plus_truncate_uchar(a, b) 
  If a+b<0 : a=0 : ElseIf  a+b>255 : a=255 : Else : a+b : EndIf
EndMacro

Macro compute_disperse(channel)
  error = (*colorbits\pixels[x + y*w]\channel - *palette\index[index]\channel ) >> 1 ; shift to soften
  If (x + 1) < w : plus_truncate_uchar(*colorbits\pixels[(x+1) + (y+0)*w]\channel, (error*7) >> 4) : EndIf
  If (y + 1) < h 
    If (x - 1) > 0
      plus_truncate_uchar(*colorbits\pixels[(x-1) + (y+1)*w]\channel, (error*3) >> 4)
      plus_truncate_uchar(*colorbits\pixels[(x+0) + (y+1)*w]\channel, (error*5) >> 4)
      If (x + 1 < w) : plus_truncate_uchar(*colorbits\pixels[(x+1)+(y+1)*w]\channel,(error*1) >> 4) : EndIf
    EndIf
  EndIf
EndMacro
;}

Procedure.i CreatePalette_NeuQuant (*ImageBuf, iQuality=10)  
  ; By luis Oct 2009
  ; Modified by netmaestro to accept buffer as
  ; parameter instead of image handle Oct 2013
  
  Protected *Palette, k
  Protected Dim tPaletteTable.RGBQUAD(256)
  
  If *ImageBuf
    NQ_InitNetwork()
    
    If NQ_Learn(*ImageBuf, MemorySize(*ImageBuf), iQuality)
      
      NQ_UnbiasNetwork()
      
      For k = 0 To 255
        With tPaletteTable(k)
          \rgbBlue  = g_NQ_Network(k)\iBGRC[0] 
          \rgbGreen = g_NQ_Network(k)\iBGRC[1] 
          \rgbRed   = g_NQ_Network(k)\iBGRC[2] 
          \rgbReserved = 0
        EndWith       
      Next
      
      *Palette = AllocateMemory (256 * SizeOf(RGBQUAD))
      CopyMemory(@tPaletteTable(), *Palette, MemorySize(*Palette))
    EndIf
  EndIf
  
  ProcedureReturn *Palette 
EndProcedure

;=============================================================
;{            Save as GIF Section
;=============================================================

Structure RGB 
  Blue.a
  Green.a
  Red.a
EndStructure

Structure COLORMAPINDEX
  color.l
  newindex.a
EndStructure

Structure GIF_HEADER
  bytes.a[6]
EndStructure

Structure LOGICAL_SCREEN_DESCRIPTOR
  Width.w
  Height.w
  PackedByte.a
  BackgroundColorIndex.a
  PixelAspectRatio.a
EndStructure

Structure GRAPHICS_CONTROL_EXTENSION
  Sntroducer.a
  Label.a
  BlockSize.a
  PackedByte.a
  DelayTime.w
  TransparentColorIndex.a
  BlockTerminator.a
EndStructure

Structure IMAGE_DESCRIPTOR
  Separator.a
  ImageLeft.w
  ImagTop.w
  ImageWidth.w
  ImageHeight.w
  PackedByte.a
EndStructure

Structure code
  code.l
  size.l
EndStructure

DataSection
  gifheader:
  Data.a $47, $49, $46, $38, $39, $61
  graphicscontrolextension:
  Data.a $21, $F9, $04, $00, $00, $00, $00, $00
  applicationextension:
  Data.b $21, $FF, $0B, $4E, $45, $54, $53, $43, $41, $50, $45, $32, $2E, $30, $03, $01, $00, $00, $00
EndDataSection

UsePNGImageDecoder()
UseJPEGImageDecoder() 
UseTIFFImageDecoder()

dither = 1

pattern$ = "PNG, BMP, JPEG, TIFF|*.png;*.bmp;*.jpg;*.jpeg;*.tiff|PNG (*.png)|*.png|BMP (*.bmp)|*.bmp|JPEG (*.jpg)|*.jpg|TIFF (*.tif)|*.tif"
inpath$ = OpenFileRequester("Choose an image to convert to GIF:","",pattern$, 0)
prompt$ = RemoveString(GetFilePart(inpath$),"."+GetExtensionPart(inpath$))
prompt$ +".gif"
If FileSize(inpath$) < 1
  MessageRequester("Info:","No file selected. Ending...")
  End
EndIf

If inpath$
  hImageIn = LoadImage(#PB_Any, inpath$) 
Else
  MessageRequester("Error:", "Cannot open this image. Ending...")
  End
EndIf

w=ImageWidth(hImageIn)
h=ImageHeight(hImageIn)

StartDrawing(ImageOutput(hImageIn))
  If w*h<1600
    *colorbits.COLORBITS_RGBQUAD = AllocateMemory(1600*4)
  Else
    *colorbits.COLORBITS_RGBQUAD = AllocateMemory(w*h*4)
  EndIf
  cc=0
  For j=0 To h-1
    For i=0 To w-1
      With *colorbits\pixels[cc]
        \rgbRed   = Red(Point(i,j))
        \rgbGreen = Green(Point(i,j))
        \rgbBlue  = Blue(Point(i,j))
      EndWith
      cc+1
    Next
  Next
StopDrawing()

*palette.PALETTE = CreatePalette_NeuQuant(*colorbits)
*bits8.COLORBITS_1BYTE_PERPIXEL = AllocateMemory(w*h)

NQ_SortNetwork() ; for NQ_LookupPalette to work properly

; Set and optionally dither the colorbits
For y = 0 To h-1
  For x = 0 To w-1
    index.a = NQ_LookupPalette (*colorbits\pixels[x + y*w]\rgbBlue, *colorbits\pixels[x + y*w]\rgbGreen, *colorbits\pixels[x + y*w]\rgbRed)
    *bits8\pixels[x + y*w] = index
    If dither
      compute_disperse(rgbRed)
      compute_disperse(rgbGreen)
      compute_disperse(rgbBlue)
    EndIf
  Next
Next

Dim colors.l(255)
Numcolors = 256
CopyMemory(*palette, @colors(), MemorySize(*palette))
FreeMemory(*palette)
sz_colorbits = MemorySize(*bits8)

;Count actual colors used in the image
NewMap colormap.COLORMAPINDEX()
For i=*bits8 To *bits8+sz_colorbits-1
  If AddMapElement(colormap(), RSet(Hex(PeekA(i),#PB_Byte),2,"0"), #PB_Map_ElementCheck)
    colormap(RSet(Hex(PeekA(i),#PB_Byte),2,"0"))\color = colors(PeekA(i))
  EndIf
Next
NumColors = MapSize(colormap())

;If needed, shrink colortable To actual colors used And remap the colorbits To the new table
If NumColors < 256
  Dim colors2.l(MapSize(colormap())-1)
  cc=0
  ForEach colormap()
    colors2(cc) = colormap()\color
    colormap()\newindex = cc
    cc+1
  Next
  For i=*bits8 To *bits8+sz_colorbits-1
    PokeA(i, colormap(RSet(Hex(PeekA(i),#PB_Byte),2,"0"))\newindex)
  Next
Else
  Dim colors2.l(ArraySize(colors()))
  CopyArray(colors(), colors2())
EndIf
FreeArray(colors())

Select NumColors
  Case 1 To 3
    bytes_colortable = 12
    sz_colortable    = 1
    min_codesize     = 2
    
  Case 4 To 7
    bytes_colortable = 24
    sz_colortable    = 2
    min_codesize     = 3
    
  Case 8 To 15
    bytes_colortable = 48
    sz_colortable    = 3
    min_codesize     = 4
    
  Case 16 To 31
    bytes_colortable = 96
    sz_colortable    = 4
    min_codesize     = 5
    
  Case 32 To 63
    bytes_colortable = 192
    sz_colortable    = 5
    min_codesize     = 6
    
  Case 64 To 127
    bytes_colortable = 384
    sz_colortable    = 6
    min_codesize     = 7
    
  Case 128 To 256
    bytes_colortable = 768
    sz_colortable    = 7
    min_codesize     = 8
    
EndSelect

pattern$ = "GIF (*.gif)|*.gif;"
outpath$ = SaveFileRequester("Choose a path to save the .gif file:",prompt$,pattern$, 0)
outpath$ = RemoveString(outpath$, ".gif")
If outpath$
  outpath$ + ".gif"
  If Not CreateFile(0, outpath$)
    MessageRequester("Error:","Problem creating image file... ending.")
    End
  EndIf
Else
  MessageRequester("","No save location chosen... ending.")
  End
EndIf

*write_header.GIF_HEADER = AllocateMemory(SizeOf(GIF_HEADER))
CopyMemory(?gifheader, *write_header, SizeOf(GIF_HEADER))

WriteData(0, *write_header, MemorySize(*write_header))

*write_logical_screen_descriptor.LOGICAL_SCREEN_DESCRIPTOR = AllocateMemory(SizeOf(LOGICAL_SCREEN_DESCRIPTOR))
With *write_logical_screen_descriptor
  \width      = w
  \height     = h
  \PackedByte = $80|(sz_colortable<<4)|sz_colortable
EndWith

WriteData(0, *write_logical_screen_descriptor, MemorySize(*write_logical_screen_descriptor))

*colortable = AllocateMemory(bytes_colortable)
*writeptr.RGB = *colortable
For i=0 To ArraySize(colors2())
  *writeptr\Red   = Red(colors2(i))
  *writeptr\Green = Green(colors2(i))
  *writeptr\Blue  = Blue(colors2(i))
  *writeptr+SizeOf(RGB)
Next

WriteData(0, *colortable, MemorySize(*colortable))

*write_graphics_control_extension.GRAPHICS_CONTROL_EXTENSION = AllocateMemory(SizeOf(GRAPHICS_CONTROL_EXTENSION))
CopyMemory(?graphicscontrolextension, *write_graphics_control_extension, SizeOf(GRAPHICS_CONTROL_EXTENSION))

WriteData(0, *write_graphics_control_extension, MemorySize(*write_graphics_control_extension))

*write_application_extension = AllocateMemory(27)
CopyMemory(?applicationextension, *write_application_extension, 19)
;WriteData(0, *write_application_extension, MemorySize(*write_application_extension))

*write_image_descriptor.IMAGE_DESCRIPTOR = AllocateMemory(SizeOf(IMAGE_DESCRIPTOR))
With *write_image_descriptor
  \Separator   = $2C
  \ImageWidth  = w
  \ImageHeight = h
EndWith

WriteData(0, *write_image_descriptor, MemorySize(*write_image_descriptor))

; Encode the colorbits

Procedure EncodeBits(*buf.Long, *bits8.Ascii, min_codesize, num_colors)
  
  #SizeOf2Char = SizeOf(character) << 1
  
  Structure TwoChar
    StructureUnion
      s.s{2}
      CompilerIf #PB_Compiler_Unicode
        cc.l  
      CompilerElse
        cc.w
      CompilerEndIf
    EndStructureUnion
  EndStructure
  
  Static Dim hex_bytes.TwoChar(255)
  
  NewMap CodeTable.l(4096)
  
  Protected ib.s{8192}
  Protected *buf_.Long, *ib.TwoChar = @ib, *bits8end = *bits8 + MemorySize(*bits8)
  Protected.l i, l, bitpos, codesize, clrcode = 1 << min_codesize
  Protected.l endcode = clrcode + 1, nextcode = clrcode + 2
  
  If hex_bytes(0)\cc = 0
    For i = 0 To 255
      hex_bytes(i)\s = Right(Hex(i|256), 2)
    Next
  EndIf
  
  CodeTable("00") = $1000 : codesize = min_codesize + 1
  i = 1 : While i < num_colors
    CodeTable(hex_bytes(i)\s) = i : i + 1  
  Wend
  
  *buf\l = clrcode
  bitpos = codesize
  
  While *bits8 < *bits8end
    *ib\cc = hex_bytes(*bits8\a)\cc : *ib + #SizeOf2Char : *ib\cc = 0
    i = CodeTable(ib)
    If i = 0
      *buf_ = *buf + bitpos >> 3 : *buf_\l | l << (bitpos & 7) : bitpos + codesize       
      CodeTable(ib) = nextcode
      codesize + nextcode >> codesize : nextcode + 1
      If codesize > 12
        *buf_ = *buf + bitpos >> 3 : *buf_\l | clrcode << (bitpos & 7) : bitpos + 12       
        nextcode = endcode + 1
          
        ClearMap(CodeTable())
        CodeTable("00") = $1000 : codesize = min_codesize + 1
        i = 1 : While i < num_colors
          CodeTable(hex_bytes(i)\s) = i : i + 1  
        Wend
          
      EndIf
      *ib = @ib : i = *bits8\a : *ib\cc = hex_bytes(i)\cc : *ib + #SizeOf2Char : *ib\cc = 0
    EndIf
    l = i & $fff
    *bits8 + 1    
  Wend
  
  *buf_ = *buf + bitpos >> 3 : *buf_\l | l << (bitpos & 7) : bitpos + codesize       
  *buf_ = *buf + bitpos >> 3 : *buf_\l | endcode << (bitpos & 7) : bitpos + codesize
  
  ProcedureReturn ((bitpos + 7) & -8) >> 3
EndProcedure

*buf = AllocateMemory(1024*1024*10)
*readbuf = *buf
cctotal = EncodeBits(*buf, *bits8, min_codesize, NumColors)

WriteByte(0, min_codesize)
While cctotal > 255
  WriteByte(0, 255)
  WriteData(0, *readbuf, 255)
  *readbuf + 255
  cctotal - 255
Wend
WriteByte(0, cctotal)
WriteData(0, *readbuf, cctotal)

WriteByte(0, $00)
WriteByte(0, $3B)

CloseFile(0)

;========================================================================
;
;========================================================================

netmaestro · Post by **netmaestro** » Sat Nov 16, 2013 7:27 am

Thanks for working on it! A significant portion of the processing time is taken up by the neural network and that would be a major undertaking to optimize. However, the version you posted speeds up the encoding part quite a bit. In some larger more complex files the speed gain is quite marked. One file that takes around 810 ms to process with the original code takes 275 with this version. Now to take it through the full spectrum of testing and make sure it's solid. (but it seems so.)

wilbert · Post by **wilbert** » Sat Nov 16, 2013 7:37 am

netmaestro wrote:Now to take it through the full spectrum of testing and make sure it's solid. (but it seems so.)

You should especially check the larger image that didn't work when you tried switching the hex strings from 4 to 2 bytes.
The updated code I posted also uses 2 bytes but builds the total string used for the map in a different way.

wilbert · Post by **wilbert** » Sat Nov 16, 2013 8:08 am

This might even further reduce the speed but needs some testing.

Code: Select all

Procedure EncodeBits(*buf.Long, *bits8.Ascii, min_codesize, num_colors)
  
  #SizeOf2Char = SizeOf(character) << 1
  
  Structure TwoChar
    StructureUnion
      s.s{2}
      CompilerIf #PB_Compiler_Unicode
        cc.l  
      CompilerElse
        cc.w
      CompilerEndIf
    EndStructureUnion
  EndStructure
  
  Static Dim hex_bytes.TwoChar(255)
  
  NewMap CodeTable.l(4096)
  
  Protected ib.s{8192}
  Protected *buf_.Long, *ib.TwoChar = @ib, *bits8end = *bits8 + MemorySize(*bits8)
  Protected.l i, l, bitpos, codesize = min_codesize + 1, clrcode = 1 << min_codesize
  Protected.l endcode = clrcode + 1, nextcode = clrcode + 2
  
  If hex_bytes(0)\cc = 0
    For i = 0 To 255
      hex_bytes(i)\s = Right(Hex(i|256), 2)
    Next
  EndIf
  
  *buf\l = clrcode
  bitpos = codesize
  
  If *bits8 <> *bits8end
    l = *bits8\a: *ib\cc = hex_bytes(l)\cc : *ib + #SizeOf2Char : *bits8 + 1
    While *bits8 < *bits8end
      *ib\cc = hex_bytes(*bits8\a)\cc : *ib + #SizeOf2Char: *ib\cc = 0
      i = CodeTable(ib)
      If i = 0
        *buf_ = *buf + bitpos >> 3 : *buf_\l | l << (bitpos & 7) : bitpos + codesize       
        CodeTable(ib) = nextcode
        codesize + nextcode >> codesize : nextcode + 1
        If codesize > 12
          *buf_ = *buf + bitpos >> 3 : *buf_\l | clrcode << (bitpos & 7) : bitpos + 12       
          nextcode = endcode + 1: codesize = min_codesize + 1
          ClearMap(CodeTable())
        EndIf
        *ib = @ib : i = *bits8\a : *ib\cc = hex_bytes(i)\cc : *ib + #SizeOf2Char
      EndIf
      l = i
      *bits8 + 1    
    Wend
    *buf_ = *buf + bitpos >> 3 : *buf_\l | l << (bitpos & 7) : bitpos + codesize       
  EndIf
  
  *buf_ = *buf + bitpos >> 3 : *buf_\l | endcode << (bitpos & 7) : bitpos + codesize
  
  ProcedureReturn ((bitpos + 7) & -8) >> 3
EndProcedure

netmaestro · Post by **netmaestro** » Sat Nov 16, 2013 8:25 am

Still testing the first version, so far no failures. I briefly tested the last proc you posted for speed against its predecessor and found no significant improvement. As I'm just starting to understand what's going on with the first one, I'll stick with that. Your procedure-izing the EncodeBits routine is a help, I was actually engaged in doing just that when you posted your faster version. It's necessary for structure as I move to animation. It's going to be:

Code: Select all

write header
write logical screen descriptor
write graphics control extension
foreach image
   write image descriptor
   write local color table
   write image data
next
write application extension
write comment extension
write trailer byte

wilbert · Post by **wilbert** » Sat Nov 16, 2013 9:26 am

netmaestro wrote:Still testing the first version, so far no failures. I briefly tested the last proc you posted for speed against its predecessor and found no significant improvement.

My last proc takes advantage of the fact that a single pixel lookup always returned the initial value.

Map("01") = 1
...
Map("09") = 9
...

Since the outcome is known, the initial colors don't have to be stored in the code table, only the ones that represent a pattern of multiple pixels.
It makes clearing the code table easier and while num_colors is still in the procedure declaration, it could be removed since it's no longer used in that version.
Speedwise a small improvement is to be expected in complex images where the code table is renewed a lot. Also it would be a bit easier to port to asm if needed but you already mentioned that other parts of the code consume more time.

netmaestro · Post by **netmaestro** » Sat Nov 23, 2013 12:23 am

Pored over this for a while now and came to a couple of conclusions.
One, I have to write my own decoder as the one from Localmotion34 uses windows imaging and also doesn't work quite right in some cases.
Two, in order to do that I have to do the encoding process backwards which means I have to understand the process fully. This is a problem with Wilbert's faster version because I haven't had any luck going backwards with it. As the faster version is only running 1.4 x faster than the SetBits version from Idle (and not 5 times faster or such) that's an acceptable compromise for me.

To that end I've written and tested a GetBits macro to read the bits that Idle's SetBits macro wrote. This is what I have:

Code: Select all

Macro Setbits(buf,value,bitcount) ; By Idle
  *tb.long 
  *tb = buf + (pos >> 3) 
  *tb\l | (value << (pos & 7)) 
  pos + bitcount 
EndMacro

Macro GetBits(buf, var, bitcount) ; By Idle & netmaestro
  *tb.Long
  *tb = buf + (pos >> 3)
  var = ( *tb\l & ( ( 1 << bitcount )-1 ) << ( pos & 7 ) ) >> ( pos & 7 )
  pos + bitcount
EndMacro

*mem = AllocateMemory(100)
SetBits(*mem, 1019, 10)
SetBits(*mem, 1020, 10)
SetBits(*mem, 1021, 10)
SetBits(*mem, 1022, 10)
SetBits(*mem, 1023, 10)
SetBits(*mem, 2043, 11)
SetBits(*mem, 2044, 11)
SetBits(*mem, 2045, 11)
SetBits(*mem, 2046, 11)
SetBits(*mem, 2047, 11)
SetBits(*mem, 4091, 12)
SetBits(*mem, 4092, 12)
SetBits(*mem, 4093, 12)
SetBits(*mem, 4094, 12)
SetBits(*mem, 4095, 12)
SetBits(*mem, 7, 3)

pos=0

result.l
For i=1 To 5
  GetBits(*mem, result, 10) : Debug result
Next
For i=1 To 5
  GetBits(*mem, result, 11) : Debug result
Next
For i=1 To 5
  GetBits(*mem, result, 12) : Debug result
Next
GetBits(*mem, result, 3) : Debug result

My question is, is this as streamlined (fast) as it can be written? Or could it be optimized. I've been looking so hard and long at it I can't tell anymore. I consider myself lucky to have gotten it working at all

Post by **idle** » Sat Nov 23, 2013 1:49 am

This one from my bitvector class may be better

Code: Select all

Macro GetBits(buf, pos, bitcount)
  *tb.long    
 mask = ($FFFFFFFF >> (32-bitcount))
 If *buf
     *ti = *buf + ((pos)>>3) 
      shift = (index & $07)
    ProcedureReturn  (*tb\l & (mask << shift)) >> shift   
 EndIf 
EndMacro

netmaestro · Post by **netmaestro** » Sat Nov 23, 2013 1:56 am

It looks quite similar to one of my earlier tries, though not equal because mine didn't work. I guess it's meant to be called from inside a procedure? (there's a ProcedureReturn in it) And what is 'index'?

Post by **idle** » Sat Nov 23, 2013 2:39 am

sorry wasn't awake there!

Code: Select all

Macro GetBits(buf,value,bitcount,pos)
 *tb.long    
   mask = ($FFFFFFFF >> (32-bitcount))
 If buf
     *ti = buf + ((pos)>>3) 
      shift = (pos & $07)
      value= (*tb\l & (mask << shift)) >> shift   
 EndIf 
EndMacro

wilbert · Post by **wilbert** » Sat Nov 23, 2013 8:09 am

netmaestro wrote:This is a problem with Wilbert's faster version because I haven't had any luck going backwards with it.

Just like with the encoding routine, it could probably be optimized after you create a working version of a decoding routine.

netmaestro · Post by **netmaestro** » Sat Nov 23, 2013 8:15 am

Makes sense. When I have a working version using my GetBits routine I'll turn it over for "turbocharging"

Kwai chang caine · Post by **Kwai chang caine** » Sat Nov 23, 2013 8:15 pm

NetMaestro wrote:Once this is finished KCC can have his Gif animator

Never i can thanks you, in all my life, if a day you can create a GIF Encoder/Decoder.
If this day come, even if i'm a french "Frog eater"....I try to give them their thighs now

Excuse me to not have testing before, but currently i not really programming

NetMaestro wrote:Could you please test this code and see how crossplatform it is? And let me know of any bugs/failures.

so i have tested it in an old PC with XP PRO SP3 and v5.20
After a long time, i have nice GIF.

She is nearly perfect, except the front of the nice woman, now have a paint like a sqaw

See yourself :

The original JPG french singer

And the result

And the code of WILBERT do the same thing.
Thank you to all of you masters, trying to give PB, this decoder that lacks

PureBasic Forums - English

Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends

Re: Packing bytes: a little help from my friends