HSL/HSV to RGB

Just starting out? Need help? Post your questions and find answers here.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3944
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: HSL/HSV to RGB

Post by wilbert »

Instead of floats for the multiplication factors, I'm using 16 bit signed integer values which in the end are divided by 16384 (16384 is considered 1).
The signed integer range is [-32768, 32767] so divided by 16384 is a range of [-2, 1.999] in steps of 0.000061 which I believe is accurate enough for working with colors.
The PMADDWD opcode and is what makes the multiplication fast. :)
It also helps that there's no conversion between integers and floats and the pack instructions are already clipping the output values in the range of [0, 255].

Edit:
See my post below for an updated version which is a bit faster
viewtopic.php?p=607263#p607263


Code: Select all

Structure ColorMatrix
  m.w[16]
  a.l[04]
EndStructure

Procedure SetTransformHSV(*Matrix.ColorMatrix, h.f, s.f, v.f, ChannelOrder = 0)
  
  ; ChannelOrder 0 => RGBA (MacOS)
  ; ChannelOrder 1 => BGRA (Windows)
  
  Protected.f vsu, vsw
  s * 0.01
  v * 0.01
  vsu = v*s*Cos(h*#PI/180)
  vsw = v*s*Sin(h*#PI/180)
  
  ClearStructure(*Matrix, ColorMatrix)
  *Matrix\m[15] = 16384 ; alpha in -> out multiplier
  *Matrix\a[00] = 8192  ; constants to add for rounding
  *Matrix\a[01] = 8192
  *Matrix\a[02] = 8192
  
  If ChannelOrder = 1
    ; BGRA
    *Matrix\m[12] = (0.299*v + 0.701*vsu + 0.168*vsw)*16384 ; red in    red out
    *Matrix\m[05] = (0.587*v - 0.587*vsu + 0.330*vsw)*16384 ; green in  red out
    *Matrix\m[04] = (0.114*v - 0.114*vsu - 0.497*vsw)*16384 ; blue in   red out
    *Matrix\m[10] = (0.299*v - 0.299*vsu - 0.328*vsw)*16384 ; red in    green out
    *Matrix\m[03] = (0.587*v + 0.413*vsu + 0.035*vsw)*16384 ; green in  green out
    *Matrix\m[02] = (0.114*v - 0.114*vsu + 0.292*vsw)*16384 ; blue in   green out
    *Matrix\m[08] = (0.299*v - 0.300*vsu +  1.25*vsw)*16384 ; red in    blue out
    *Matrix\m[01] = (0.587*v - 0.588*vsu -  1.05*vsw)*16384 ; green in  blue out
    *Matrix\m[00] = (0.114*v + 0.886*vsu - 0.203*vsw)*16384 ; blue in   blue out
  Else
    ; RGBA
    *Matrix\m[00] = (0.299*v + 0.701*vsu + 0.168*vsw)*16384 ; red in    red out
    *Matrix\m[01] = (0.587*v - 0.587*vsu + 0.330*vsw)*16384 ; green in  red out
    *Matrix\m[08] = (0.114*v - 0.114*vsu - 0.497*vsw)*16384 ; blue in   red out
    *Matrix\m[02] = (0.299*v - 0.299*vsu - 0.328*vsw)*16384 ; red in    green out
    *Matrix\m[03] = (0.587*v + 0.413*vsu + 0.035*vsw)*16384 ; green in  green out
    *Matrix\m[10] = (0.114*v - 0.114*vsu + 0.292*vsw)*16384 ; blue in   green out
    *Matrix\m[04] = (0.299*v - 0.300*vsu +  1.25*vsw)*16384 ; red in    blue out
    *Matrix\m[05] = (0.587*v - 0.588*vsu -  1.05*vsw)*16384 ; green in  blue out
    *Matrix\m[12] = (0.114*v + 0.886*vsu - 0.203*vsw)*16384 ; blue in   blue out
  EndIf
  
EndProcedure

Procedure ApplyTransform(*Matrix.ColorMatrix, *InPixels, *OutPixels, NumPixels)
  !mov ecx, [p.v_NumPixels]
  !sub ecx, 1
  !jc .l1
  !pxor xmm5, xmm5
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rax, [p.p_Matrix]
    !movdqu xmm2, [rax]
    !movdqu xmm3, [rax + 16]
    !movdqu xmm4, [rax + 32]
    !mov rax, [p.p_InPixels]
    !mov rdx, [p.p_OutPixels]
    !.l0:
    !movd xmm1, [rax + rcx*4] ; load pixel
  CompilerElse
    !mov eax, [p.p_Matrix]
    !movdqu xmm2, [eax]
    !movdqu xmm3, [eax + 16]
    !movdqu xmm4, [eax + 32]
    !mov eax, [p.p_InPixels]
    !mov edx, [p.p_OutPixels]
    !.l0:
    !movd xmm1, [eax + ecx*4] ; load pixel
  CompilerEndIf
  !punpcklbw xmm1, xmm5       ; zero extend bytes to words (xmm5 = 0)
  !pshufd xmm0, xmm1, 0       ; xmm0 [c1c0 c1c0 c1c0 c1c0]
  !pshufd xmm1, xmm1, 85      ; xmm1 [c3c2 c3c2 c3c2 c3c2]
  !pmaddwd xmm0, xmm2         ; multiply and add
  !pmaddwd xmm1, xmm3         ; multiply and add
  !paddd xmm0, xmm1           ; add together
  !paddd xmm0, xmm4           ; add constant
  !psrad xmm0, 14             ; reduce to byte range
  !packssdw xmm0, xmm0        ; convert 32s > 16s
  !packuswb xmm0, xmm0        ; convert 16s > 8u
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movd [rdx + rcx*4], xmm0 ; store pixel
  CompilerElse
    !movd [edx + ecx*4], xmm0 ; store pixel
  CompilerEndIf
  !sub ecx, 1
  !jnc .l0
  !.l1:
EndProcedure


; Create application window

Define m.ColorMatrix

UseJPEGImageDecoder()
UsePNGImageDecoder()

If OpenWindow(0, 0, 0, 910, 530, "HSV Color Transform", #PB_Window_SystemMenu|#PB_Window_ScreenCentered)
  If CreateStatusBar(0, WindowID(0))
    AddStatusBarField(#PB_Ignore)
    StatusBarText(0, 0, "Nothing processed yet")
  EndIf
  
  ScrollAreaGadget(0, 10, 10, 440, 400, 0, 0, 10, #PB_ScrollArea_Flat|#PB_ScrollArea_Center)
  ImageGadget(1, 0, 0, 0, 0, 0)
  CloseGadgetList()
  
  ScrollAreaGadget(2, 460, 10, 440, 400, 10, 0, 10, #PB_ScrollArea_Flat|#PB_ScrollArea_Center)
  ImageGadget(3, 0, 0, 0, 0, 0)
  CloseGadgetList()
  
  TextGadget(4, 12, 430, 98, 30, "Hue rotation")
  SpinGadget(5, 10, 450, 100, 30, -180, 360, #PB_Spin_Numeric)
  TextGadget(6, 132, 430, 98, 30, "Saturation")
  SpinGadget(7, 130, 450, 100, 30, 0, 100, #PB_Spin_Numeric)
  TextGadget(8, 252, 430, 98, 30, "Value")
  SpinGadget(9, 250, 450, 100, 30, 0, 100, #PB_Spin_Numeric)
  SetGadgetState(5, 0)
  SetGadgetState(7, 100)
  SetGadgetState(9, 100)

  ButtonGadget(10, 380, 450, 120, 30, "Apply transform")
  ButtonGadget(11, 780, 450, 120, 30, "Load image")
  
  Repeat
    Event = WaitWindowEvent()
    If Event = #PB_Event_Gadget
      Select EventGadget()
        Case 10:
          If IsImage(0)
            If CreateImage(1, ImageWidth(0), ImageHeight(0), 32) And StartDrawing(ImageOutput(1))
              ; make a 32 bit copy of the loaded image
              DrawingMode(#PB_2DDrawing_AllChannels)
              DrawImage(ImageID(0), 0, 0)
              ; get the buffer address
              *PixelBuffer = DrawingBuffer()
              PixelCount = OutputHeight()*DrawingBufferPitch() >> 2
              ; set the transform matrix
              If DrawingBufferPixelFormat() = #PB_PixelFormat_32Bits_RGB 
                SetTransformHSV(@m, GetGadgetState(5), GetGadgetState(7), GetGadgetState(9), 0)
              Else
                SetTransformHSV(@m, GetGadgetState(5), GetGadgetState(7), GetGadgetState(9), 1)
              EndIf
              ; apply the transform matrix
              t1 = ElapsedMilliseconds()
              ApplyTransform(@m, *PixelBuffer, *PixelBuffer, PixelCount)              
              t2 = ElapsedMilliseconds()
              StopDrawing()
              SetGadgetState(3, ImageID(1))
              SetGadgetAttribute(2, #PB_ScrollArea_InnerWidth, ImageWidth(1))
              SetGadgetAttribute(2, #PB_ScrollArea_InnerHeight, ImageHeight(1))
              StatusBarText(0, 0, "Processed "+Str(PixelCount)+" pixels in "+Str(t2-t1)+" ms")
            EndIf
          EndIf
        Case 11:
          File.s = OpenFileRequester("Select image file", "", "Image file | *.png;*.jpg;*.jpeg", 0)
          If File And LoadImage(0, File)
            SetGadgetState(1, ImageID(0))
            SetGadgetAttribute(0, #PB_ScrollArea_InnerWidth, ImageWidth(0))
            SetGadgetAttribute(0, #PB_ScrollArea_InnerHeight, ImageHeight(0))
          EndIf
      EndSelect
    EndIf
  Until Event = #PB_Event_CloseWindow
  
EndIf
Last edited by wilbert on Sun Sep 17, 2023 7:16 am, edited 1 time in total.
Windows (x64)
Raspberry Pi OS (Arm64)
SMaag
Enthusiast
Enthusiast
Posts: 352
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: HSL/HSV to RGB

Post by SMaag »

Wow! Extraterrestrial!!! I couldn't imagine that!

at my PC 1.45ms for full HD! It's 45x faster than my code with the Float and SSE!

I think I need some time to analyse why!

Thanks!!!
SMaag
Enthusiast
Enthusiast
Posts: 352
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: HSL/HSV to RGB

Post by SMaag »

The code from wilbert needs 5.5Mio CPU ticks for 2.07Mio Pixel, thats 2.66 ticks per Pixel.

The code is 24 Assembler commands long
The Lopp is only 14 commands

So one loop has to be done in 2.66 ticks, Thats 5.3 Commands per CPU tick (reciprocal throughput = 0.18)!
How is this possible? The most comands reciprocal throughput is 0.5

The Memory load: movd xmm1, [rax + rcx*4]
is on Zen3 Ryzen 7 with a latency of 3 and 1 command per cyle. In my opinion onyl this command needs 4 ticks.
Same issue for storepixel, !movd [rdx + rcx*4], xmm0; other 4 ticks! So total we need 8 ticks for this 2 commands, but the code runs in 2.6 ticks!!!

Where is the bug??? Or is it operated super parallel?

here the code with the reciprocal throuput, without any latency, the loop should need 8.75 CPU ticks!

Code: Select all

  !.loop:                     ; reciprocal throuput (AMD Zen3, Ryzen 7)
  !movd xmm1, [rax + rcx *4]  ; 1;    load pixel
  !punpcklbw xmm1, xmm5       ; 1:    zero extend bytes to words (xmm5 = 0)
  !pshufd xmm0, xmm1, 0       ; 1;    xmm0 [c1c0 c1c0 c1c0 c1c0]
  !pshufd xmm1, xmm1, 85      ; 1;    xmm1 [c3c2 c3c2 c3c2 c3c2]
  !pmaddwd xmm0, xmm2         ; 0.5;  multiply and add
  !pmaddwd xmm1, xmm3         ; 0.5;  multiply and add
  !paddd xmm0, xmm1           ; 0.25; add together
  !paddd xmm0, xmm4           ; 0.25; add constant
  !psrad xmm0, 14             ; 0.5;  reduce to byte range
  !packssdw xmm0, xmm0        ; 0.5;  convert 32s > 16s
  !packuswb xmm0, xmm0        ; 0.5;  convert 16s > 8u
  !movd [rdx + rcx *4], xmm0  ; 1;    store pixel
  
  !sub rcx, 1                 ; 0.25; NumPixels - 1
  !jnc .loop                  ; 0.5
  !.Endif:                    ; Sum=8.75 ticks; Avg=8.75/14 = 0.625 ticks per command

User avatar
Piero
Addict
Addict
Posts: 1163
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: HSL/HSV to RGB

Post by Piero »

boddhi wrote: Fri Sep 08, 2023 9:22 pmHue to RGB
Wow, I'm getting (much faster) interesting trippy results with that... (seems like the luminosity is a little bit lower, but that's not a problem)
Thanks!
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3944
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: HSL/HSV to RGB

Post by wilbert »

SMaag wrote: Mon Sep 11, 2023 7:32 pmWhere is the bug??? Or is it operated super parallel?

here the code with the reciprocal throuput, without any latency, the loop should need 8.75 CPU ticks!
A lot of instructions in my code depend on the result of the previous instruction so you also have latency.
The simple answer is that apparently using rdtsc isn't reliable for this purpose.
Windows (x64)
Raspberry Pi OS (Arm64)
Fred
Administrator
Administrator
Posts: 18428
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: HSL/HSV to RGB

Post by Fred »

Another wilbert magic :)
Olli
Addict
Addict
Posts: 1272
Joined: Wed May 27, 2020 12:26 pm

Re: HSL/HSV to RGB

Post by Olli »

Smaag wrote:Where is the bug??? Or is it operated super parallel?
Also your i/o are not aligned, and not parallelized.

That is not pixel per pixel, you should work, but
4-pixels per 4-pixels, minimum. Better should be a 16-pixels work, 16 times better than your algo...

You are eating tiramisu with a construction shovel. :?


[Edit] Ow... God... Wilbert !
I apologized ! My latency to go up to the source...
Could we use YMMn Or ZMMn ?
AZJIO
Addict
Addict
Posts: 2245
Joined: Sun May 14, 2017 1:48 am

Re: HSL/HSV to RGB

Post by AZJIO »

Perhaps the adjustment (TrackBarGadget) will be interesting for Piero

Added TrackBarGadget() to wilbert's code

Code: Select all


EnableExplicit

Define *PixelBuffer, PixelCount, t1, t2

Structure ColorMatrix
  m.w[16]
  a.l[04]
EndStructure

; wilbert
; https://www.purebasic.fr/english/viewtopic.php?p=607153#p607153
Procedure SetTransformHSV(*Matrix.ColorMatrix, h.f, s.f, v.f, ChannelOrder = 0)
  
  ; ChannelOrder 0 => ABGR (MacOS)
  ; ChannelOrder 1 => ARGB (Windows)
  
  Protected.f vsu, vsw
  s * 0.01
  v * 0.01
  vsu = v*s*Cos(h*#PI/180)
  vsw = v*s*Sin(h*#PI/180)
  
  ClearStructure(*Matrix, ColorMatrix)
  *Matrix\m[15] = 16384 ; alpha in -> out multiplier
  *Matrix\a[00] = 8192  ; constants to add for rounding
  *Matrix\a[01] = 8192
  *Matrix\a[02] = 8192
  
  If ChannelOrder = 1
    ; ARGB
    *Matrix\m[12] = (0.299*v + 0.701*vsu + 0.168*vsw)*16384 ; red in    red out
    *Matrix\m[05] = (0.587*v - 0.587*vsu + 0.330*vsw)*16384 ; green in  red out
    *Matrix\m[04] = (0.114*v - 0.114*vsu - 0.497*vsw)*16384 ; blue in   red out
    *Matrix\m[10] = (0.299*v - 0.299*vsu - 0.328*vsw)*16384 ; red in    green out
    *Matrix\m[03] = (0.587*v + 0.413*vsu + 0.035*vsw)*16384 ; green in  green out
    *Matrix\m[02] = (0.114*v - 0.114*vsu + 0.292*vsw)*16384 ; blue in   green out
    *Matrix\m[08] = (0.299*v - 0.300*vsu +  1.25*vsw)*16384 ; red in    blue out
    *Matrix\m[01] = (0.587*v - 0.588*vsu -  1.05*vsw)*16384 ; green in  blue out
    *Matrix\m[00] = (0.114*v + 0.886*vsu - 0.203*vsw)*16384 ; blue in   blue out
  Else
    ; RGBA
    *Matrix\m[00] = (0.299*v + 0.701*vsu + 0.168*vsw)*16384 ; red in    red out
    *Matrix\m[01] = (0.587*v - 0.587*vsu + 0.330*vsw)*16384 ; green in  red out
    *Matrix\m[08] = (0.114*v - 0.114*vsu - 0.497*vsw)*16384 ; blue in   red out
    *Matrix\m[02] = (0.299*v - 0.299*vsu - 0.328*vsw)*16384 ; red in    green out
    *Matrix\m[03] = (0.587*v + 0.413*vsu + 0.035*vsw)*16384 ; green in  green out
    *Matrix\m[10] = (0.114*v - 0.114*vsu + 0.292*vsw)*16384 ; blue in   green out
    *Matrix\m[04] = (0.299*v - 0.300*vsu +  1.25*vsw)*16384 ; red in    blue out
    *Matrix\m[05] = (0.587*v - 0.588*vsu -  1.05*vsw)*16384 ; green in  blue out
    *Matrix\m[12] = (0.114*v + 0.886*vsu - 0.203*vsw)*16384 ; blue in   blue out
  EndIf
  
EndProcedure

; wilbert
; https://www.purebasic.fr/english/viewtopic.php?p=607153#p607153
Procedure ApplyTransform(*Matrix.ColorMatrix, *InPixels, *OutPixels, NumPixels)
  !mov ecx, [p.v_NumPixels]
  !sub ecx, 1
  !jc .l1
  !pxor xmm5, xmm5
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rax, [p.p_Matrix]
    !movdqu xmm2, [rax]
    !movdqu xmm3, [rax + 16]
    !movdqu xmm4, [rax + 32]
    !mov rax, [p.p_InPixels]
    !mov rdx, [p.p_OutPixels]
    !.l0:
    !movd xmm1, [rax + rcx*4] ; load pixel
  CompilerElse
    !mov eax, [p.p_Matrix]
    !movdqu xmm2, [eax]
    !movdqu xmm3, [eax + 16]
    !movdqu xmm4, [eax + 32]
    !mov eax, [p.p_InPixels]
    !mov edx, [p.p_OutPixels]
    !.l0:
    !movd xmm1, [eax + ecx*4] ; load pixel
  CompilerEndIf
  !punpcklbw xmm1, xmm5       ; zero extend bytes to words (xmm5 = 0)
  !pshufd xmm0, xmm1, 0       ; xmm0 [c1c0 c1c0 c1c0 c1c0]
  !pshufd xmm1, xmm1, 85      ; xmm1 [c3c2 c3c2 c3c2 c3c2]
  !pmaddwd xmm0, xmm2         ; multiply and add
  !pmaddwd xmm1, xmm3         ; multiply and add
  !paddd xmm0, xmm1           ; add together
  !paddd xmm0, xmm4           ; add constant
  !psrad xmm0, 14             ; reduce to byte range
  !packssdw xmm0, xmm0        ; convert 32s > 16s
  !packuswb xmm0, xmm0        ; convert 16s > 8u
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movd [rdx + rcx*4], xmm0 ; store pixel
  CompilerElse
    !movd [edx + ecx*4], xmm0 ; store pixel
  CompilerEndIf
  !sub ecx, 1
  !jnc .l0
  !.l1:
EndProcedure


; Create application window

Define m.ColorMatrix

UseJPEGImageDecoder()
UsePNGImageDecoder()


Procedure Callback()
	Protected *PixelBuffer, PixelCount, Value, m.ColorMatrix;, t1, t2
	If IsImage(0)
		; 		Debug 1
		If CreateImage(1, ImageWidth(0), ImageHeight(0), 32) And StartDrawing(ImageOutput(1))
			; make a 32 bit copy of the loaded image
			DrawingMode(#PB_2DDrawing_AllChannels)
			DrawImage(ImageID(0), 0, 0)
			; get the buffer address
			*PixelBuffer = DrawingBuffer()
			PixelCount = OutputHeight()*DrawingBufferPitch() >> 2
			; set the transform matrix
			Value = GetGadgetState(12)
			If DrawingBufferPixelFormat() = #PB_PixelFormat_32Bits_RGB 
				SetTransformHSV(@m, Value, GetGadgetState(7), GetGadgetState(9), 0)
			Else
				SetTransformHSV(@m, Value, GetGadgetState(7), GetGadgetState(9), 1)
			EndIf
			SetGadgetState(5, Value)
			; apply the transform matrix
; 			t1 = ElapsedMilliseconds()
			ApplyTransform(@m, *PixelBuffer, *PixelBuffer, PixelCount)              
; 			t2 = ElapsedMilliseconds()
			StopDrawing()
			SetGadgetState(3, ImageID(1))
			SetGadgetAttribute(2, #PB_ScrollArea_InnerWidth, ImageWidth(1))
			SetGadgetAttribute(2, #PB_ScrollArea_InnerHeight, ImageHeight(1))
; 			StatusBarText(0, 0, "Processed "+Str(PixelCount)+" pixels in "+Str(t2-t1)+" ms")
		EndIf
	EndIf
	
    ProcedureReturn 1
EndProcedure

Define Event, File.s

If OpenWindow(0, 0, 0, 910, 530, "HSV Color Transform", #PB_Window_SystemMenu|#PB_Window_ScreenCentered)
  If CreateStatusBar(0, WindowID(0))
    AddStatusBarField(#PB_Ignore)
    StatusBarText(0, 0, "Nothing processed yet")
  EndIf
  
  ScrollAreaGadget(0, 10, 10, 440, 400, 0, 0, 10, #PB_ScrollArea_Flat|#PB_ScrollArea_Center)
  ImageGadget(1, 0, 0, 0, 0, 0)
  CloseGadgetList()
  
  ScrollAreaGadget(2, 460, 10, 440, 400, 10, 0, 10, #PB_ScrollArea_Flat|#PB_ScrollArea_Center)
  ImageGadget(3, 0, 0, 0, 0, 0)
  CloseGadgetList()
  
;   TextGadget(4, 12, 430, 98, 30, "Тон")
;   SpinGadget(5, 10, 450, 100, 30, -180, 360, #PB_Spin_Numeric)
;   TextGadget(6, 132, 430, 98, 30, "Насыщенность")
;   SpinGadget(7, 130, 450, 100, 30, 0, 100, #PB_Spin_Numeric)
;   TextGadget(8, 252, 430, 98, 30, "Яркость")
;   SpinGadget(9, 250, 450, 100, 30, 0, 100, #PB_Spin_Numeric)
  TextGadget(4, 12, 430, 98, 30, "Hue rotation")
  SpinGadget(5, 10, 450, 100, 30, -180, 360, #PB_Spin_Numeric)
  TextGadget(6, 132, 430, 98, 30, "Saturation")
  SpinGadget(7, 130, 450, 100, 30, 0, 100, #PB_Spin_Numeric)
  TextGadget(8, 252, 430, 98, 30, "Value")
  SpinGadget(9, 250, 450, 100, 30, 0, 100, #PB_Spin_Numeric)
  SetGadgetState(5, 0)
  SetGadgetState(7, 100)
  SetGadgetState(9, 100)

;   ButtonGadget(10, 380, 450, 120, 30, "Применить")
;   ButtonGadget(11, 680, 450, 220, 30, "Открыть изображение")
  ButtonGadget(10, 380, 450, 120, 30, "Apply transform")
  ButtonGadget(11, 780, 450, 120, 30, "Load image")
  
  TrackBarGadget(12, 12, 480, 250, 27, 0, 360, #PB_TrackBar_Ticks)
;   BindGadgetEvent(12 , @Callback(), #PB_EventType_LeftClick) 
  
  Repeat
    Event = WaitWindowEvent()
    If Event = #PB_Event_Gadget
      Select EventGadget()
      	Case 12:
      		Callback()
        Case 10:
          If IsImage(0)
            If CreateImage(1, ImageWidth(0), ImageHeight(0), 32) And StartDrawing(ImageOutput(1))
              ; make a 32 bit copy of the loaded image
              DrawingMode(#PB_2DDrawing_AllChannels)
              DrawImage(ImageID(0), 0, 0)
              ; get the buffer address
              *PixelBuffer = DrawingBuffer()
              PixelCount = OutputHeight()*DrawingBufferPitch() >> 2
              ; set the transform matrix
              If DrawingBufferPixelFormat() = #PB_PixelFormat_32Bits_RGB 
                SetTransformHSV(@m, GetGadgetState(5), GetGadgetState(7), GetGadgetState(9), 0)
              Else
                SetTransformHSV(@m, GetGadgetState(5), GetGadgetState(7), GetGadgetState(9), 1)
              EndIf
              ; apply the transform matrix
              t1 = ElapsedMilliseconds()
              ApplyTransform(@m, *PixelBuffer, *PixelBuffer, PixelCount)              
              t2 = ElapsedMilliseconds()
              StopDrawing()
              SetGadgetState(3, ImageID(1))
              SetGadgetAttribute(2, #PB_ScrollArea_InnerWidth, ImageWidth(1))
              SetGadgetAttribute(2, #PB_ScrollArea_InnerHeight, ImageHeight(1))
              StatusBarText(0, 0, "Processed "+Str(PixelCount)+" pixels in "+Str(t2-t1)+" ms")
            EndIf
          EndIf
        Case 11:
          File.s = OpenFileRequester("Select image file", "", "Image file | *.png;*.jpg;*.jpeg", 0)
          If File And LoadImage(0, File)
            SetGadgetState(1, ImageID(0))
            SetGadgetAttribute(0, #PB_ScrollArea_InnerWidth, ImageWidth(0))
            SetGadgetAttribute(0, #PB_ScrollArea_InnerHeight, ImageHeight(0))
          EndIf
      EndSelect
    EndIf
  Until Event = #PB_Event_CloseWindow
  
EndIf
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3944
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: HSL/HSV to RGB

Post by wilbert »

Olli wrote: Tue Sep 12, 2023 8:59 am Could we use YMMn Or ZMMn ?
Like you probably already have noticed I need the full 128 bits to process a single pixel (2x PMADDWD to do the 16 multiplications required for the matrix multiplication).
If the cpu supports AVX2 or AVX512 you could indeed adapt the source to process 2 or 4 pixels in parallel making it even faster. :)
It shouldn't be that hard. My cpu doesn't support AVX512 but I can add AVX2 support if you wish (to process 2 pixels at once).

Edit:
I assumed AVX2 would always be faster but apparently on my rather old computer, it is much slower compared to normal SSE2 instructions so I'll stick to SSE2 for now.
Last edited by wilbert on Tue Sep 12, 2023 1:12 pm, edited 1 time in total.
Windows (x64)
Raspberry Pi OS (Arm64)
Olli
Addict
Addict
Posts: 1272
Joined: Wed May 27, 2020 12:26 pm

Re: HSL/HSV to RGB

Post by Olli »

I even did not take care about the 128 bits full use through PMADDWD... :o

The use of integers to solve a floating problem is a very good tip. Plus, you considered a bit to prevent from overflowing, in your converting choice. Clever.

I checked if each instruction were compatible with larger range : all your algo is to 512 bits compatible.

What about the i/o alignement flow ? (MOVAPD unabled ?) If compatible, it is a time gain without too much change. I ignore if the video memory is aligned...
SMaag
Enthusiast
Enthusiast
Posts: 352
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: HSL/HSV to RGB

Post by SMaag »

I checked the code too. Generally it should be compatible to 256 or 512Bit, what means 2 Pixel or 4 Pixel simultan.

But maybe on Ryzen, AMD do this automatically with the Branch prediction and Code-Flow optimation. The speed from the
Ryzen looks like dooing 4 Pixels simultan with 512 Bit AVX!
I tested on 3 other CPUs, only the Ryzen ist much faster then expected. It's 4 times faster than the single commands could be!
all other CPU's I tested are minimum 4 times slower than Ryzen. What is really good too, because it means the code is optimated to 0 latecy at all CPU's!

CPU's
Ryzen 5800X : 1.45ms
Intel I7 8565U : 6.4ms
Intel I7 from 2016 : 7.8ms
AMD form 2011 : 7.4ms

It looks like AMD found with ZEN a way to do on the fly prallelisation!
If we make a code for parellel calculation of more Pixels, we will see what happens. If the Ryzen is able to compute this again 4times faster.
They don't t do internal parallisation of pixel calculation. Then it is only a code-flow optimation.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3944
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: HSL/HSV to RGB

Post by wilbert »

SMaag wrote: Tue Sep 12, 2023 12:19 pmCPU's
Ryzen 5800X : 1.45ms
Intel I7 8565U : 6.4ms
Intel I7 from 2016 : 7.8ms
AMD form 2011 : 7.4ms
I still don't understand why your Ryzen is so fast.

I tried AVX2 but am not very familiar with it.
My experiments were much slower as the SSE2 code.

I was able to modify the SSE2 code to handle two pixels in one loop.
Is this also faster on your Ryzen ?

Edit:
Added HSL mode


Code: Select all

; HSL/HSV Transform
; Last update: 2023-09-13

Structure ColorMatrix
  m.w[16]
  a.l[04]
EndStructure

Procedure SetTransformHSV(*Matrix.ColorMatrix, h.f, s.f, v.f, ChannelOrder = 0)
  
  ; ChannelOrder 0 => RGBA (MacOS)
  ; ChannelOrder 1 => BGRA (Windows)
  
  Protected.f vsu, vsw
  s * 0.01
  v * 0.01
  vsu = v*s*Cos(h*#PI/180)
  vsw = v*s*Sin(h*#PI/180)
  
  ClearStructure(*Matrix, ColorMatrix)
  *Matrix\m[15] = 16384 ; alpha in -> out multiplier
  *Matrix\a[00] = 8192  ; constants to add for rounding
  *Matrix\a[01] = 8192
  *Matrix\a[02] = 8192
  
  If ChannelOrder = 1
    ; BGRA
    *Matrix\m[12] = (0.299*v + 0.701*vsu + 0.168*vsw)*16384 ; red in    red out
    *Matrix\m[05] = (0.587*v - 0.587*vsu + 0.330*vsw)*16384 ; green in  red out
    *Matrix\m[04] = (0.114*v - 0.114*vsu - 0.497*vsw)*16384 ; blue in   red out
    *Matrix\m[10] = (0.299*v - 0.299*vsu - 0.328*vsw)*16384 ; red in    green out
    *Matrix\m[03] = (0.587*v + 0.413*vsu + 0.035*vsw)*16384 ; green in  green out
    *Matrix\m[02] = (0.114*v - 0.114*vsu + 0.292*vsw)*16384 ; blue in   green out
    *Matrix\m[08] = (0.299*v - 0.300*vsu +  1.25*vsw)*16384 ; red in    blue out
    *Matrix\m[01] = (0.587*v - 0.588*vsu -  1.05*vsw)*16384 ; green in  blue out
    *Matrix\m[00] = (0.114*v + 0.886*vsu - 0.203*vsw)*16384 ; blue in   blue out
  Else
    ; RGBA
    *Matrix\m[00] = (0.299*v + 0.701*vsu + 0.168*vsw)*16384 ; red in    red out
    *Matrix\m[01] = (0.587*v - 0.587*vsu + 0.330*vsw)*16384 ; green in  red out
    *Matrix\m[08] = (0.114*v - 0.114*vsu - 0.497*vsw)*16384 ; blue in   red out
    *Matrix\m[02] = (0.299*v - 0.299*vsu - 0.328*vsw)*16384 ; red in    green out
    *Matrix\m[03] = (0.587*v + 0.413*vsu + 0.035*vsw)*16384 ; green in  green out
    *Matrix\m[10] = (0.114*v - 0.114*vsu + 0.292*vsw)*16384 ; blue in   green out
    *Matrix\m[04] = (0.299*v - 0.300*vsu +  1.25*vsw)*16384 ; red in    blue out
    *Matrix\m[05] = (0.587*v - 0.588*vsu -  1.05*vsw)*16384 ; green in  blue out
    *Matrix\m[12] = (0.114*v + 0.886*vsu - 0.203*vsw)*16384 ; blue in   blue out
  EndIf
  
EndProcedure

Procedure SetTransformHSL(*Matrix.ColorMatrix, h.f, s.f, v.f, ChannelOrder = 0)
  If v <= 50
    SetTransformHSV(*Matrix, h, s, 2*v, ChannelOrder)  
  Else
    SetTransformHSV(*Matrix, h, s, 200-2*v, ChannelOrder)
    *Matrix\a[00] = (0.02*v-1)*$3fc000 + 8192 ; constants to add for lightness and rounding
    *Matrix\a[01] = (0.02*v-1)*$3fc000 + 8192
    *Matrix\a[02] = (0.02*v-1)*$3fc000 + 8192
  EndIf
EndProcedure


Procedure ApplyTransform(*Matrix.ColorMatrix, *InPixels, *OutPixels, NumPixels)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rax, [p.p_Matrix]
    !movdqu xmm2, [rax]
    !movdqu xmm3, [rax + 16]
    !movdqu xmm4, [rax + 32]
    !mov rax, [p.p_InPixels]
    !mov rdx, [p.p_OutPixels]
  CompilerElse
    !mov eax, [p.p_Matrix]
    !movdqu xmm2, [eax]
    !movdqu xmm3, [eax + 16]
    !movdqu xmm4, [eax + 32]
    !mov eax, [p.p_InPixels]
    !mov edx, [p.p_OutPixels]
  CompilerEndIf
  !pxor xmm5, xmm5
  !mov ecx, [p.v_NumPixels]
  !btr ecx, 0
  !jnc .l0
  ; single pixel
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movd xmm0, [rax + rcx*4] ; load pixel
  CompilerElse
    !movd xmm0, [eax + ecx*4] ; load pixel
  CompilerEndIf
  !punpcklbw xmm0, xmm5       ; zero extend bytes to words (xmm5 = 0)
  !pshufd xmm1, xmm0, 85      ; xmm1 [c3c2 c3c2 c3c2 c3c2]
  !pshufd xmm0, xmm0, 0       ; xmm0 [c1c0 c1c0 c1c0 c1c0]
  !pmaddwd xmm1, xmm3         ; multiply and add
  !pmaddwd xmm0, xmm2         ; multiply and add
  !paddd xmm0, xmm1           ; add together
  !paddd xmm0, xmm4           ; add constant
  !psrad xmm0, 14             ; reduce to byte range
  !packssdw xmm0, xmm0        ; convert 32s > 16s
  !packuswb xmm0, xmm0        ; convert 16s > 8u
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movd [rdx + rcx*4], xmm0 ; store pixel
  CompilerElse
    !movd [edx + ecx*4], xmm0 ; store pixel
  CompilerEndIf
  !.l0:
  !sub ecx, 2
  !jc .l2
  ; two pixel loop
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !sub rsp, 32
    !movdqu [rsp], xmm6
    !movdqu [rsp+16], xmm7
    !.l1:
    !movq xmm0, [rax + rcx*4] ; load two pixels
  CompilerElse
    !.l1:
    !movq xmm0, [eax + ecx*4] ; load two pixels
  CompilerEndIf
  !punpcklbw xmm0, xmm5       ; zero extend bytes to words (xmm5 = 0)
  !pshufd xmm7, xmm0, 255     ; xmm7 [c3c2 c3c2 c3c2 c3c2]
  !pshufd xmm6, xmm0, 170     ; xmm6 [c1c0 c1c0 c1c0 c1c0]
  !pshufd xmm1, xmm0, 85      ; xmm1 [c3c2 c3c2 c3c2 c3c2]
  !pshufd xmm0, xmm0, 0       ; xmm0 [c1c0 c1c0 c1c0 c1c0]
  !pmaddwd xmm7, xmm3         ; multiply and add
  !pmaddwd xmm6, xmm2         ; multiply and add
  !pmaddwd xmm1, xmm3         ; multiply and add
  !pmaddwd xmm0, xmm2         ; multiply and add
  !paddd xmm6, xmm7           ; add together
  !paddd xmm0, xmm1           ; add together
  !paddd xmm6, xmm4           ; add constant
  !paddd xmm0, xmm4           ; add constant
  !psrad xmm6, 14             ; reduce to byte range
  !psrad xmm0, 14             ; reduce to byte range
  !packssdw xmm0, xmm6        ; convert 32s > 16s
  !packuswb xmm0, xmm0        ; convert 16s > 8u
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movq [rdx + rcx*4], xmm0 ; store two pixels
  CompilerElse
    !movq [edx + ecx*4], xmm0 ; store two pixels
  CompilerEndIf
  !sub ecx, 2
  !jnc .l1
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movdqu xmm6, [rsp]
    !movdqu xmm7, [rsp+16]
    !add rsp, 32
  CompilerEndIf
  !.l2:
EndProcedure


; Create application window

Define m.ColorMatrix

UseJPEGImageDecoder()
UsePNGImageDecoder()

If OpenWindow(0, 0, 0, 910, 530, "HSL/HSV Color Transform", #PB_Window_SystemMenu|#PB_Window_ScreenCentered)
  If CreateStatusBar(0, WindowID(0))
    AddStatusBarField(#PB_Ignore)
    StatusBarText(0, 0, "Nothing processed yet")
  EndIf
  
  ScrollAreaGadget(0, 10, 10, 440, 400, 0, 0, 10, #PB_ScrollArea_Flat|#PB_ScrollArea_Center)
  ImageGadget(1, 0, 0, 0, 0, 0)
  CloseGadgetList()
  
  ScrollAreaGadget(2, 460, 10, 440, 400, 10, 0, 10, #PB_ScrollArea_Flat|#PB_ScrollArea_Center)
  ImageGadget(3, 0, 0, 0, 0, 0)
  CloseGadgetList()
  
  TextGadget(4, 12, 428, 98, 22, "Hue rotation")
  SpinGadget(5, 10, 450, 100, 30, -180, 360, #PB_Spin_Numeric)
  TextGadget(6, 132, 428, 98, 22, "Saturation")
  SpinGadget(7, 130, 450, 100, 30, 0, 100, #PB_Spin_Numeric)
  TextGadget(8, 252, 428, 98, 22, "Value")
  SpinGadget(9, 250, 450, 100, 30, 0, 100, #PB_Spin_Numeric)
  SetGadgetState(5, 0)
  SetGadgetState(7, 100)
  SetGadgetState(9, 100)
  ComboBoxGadget(10, 370, 450, 80, 30)
  AddGadgetItem(10, -1, "HSL")
  AddGadgetItem(10, -1, "HSV")
  SetGadgetState(10, 1)
  ButtonGadget(11, 480, 450, 120, 30, "Apply transform")
  ButtonGadget(12, 780, 450, 120, 30, "Load image")
  
  Repeat
    Event = WaitWindowEvent()
    If Event = #PB_Event_Gadget
      Select EventGadget()
        Case 10:
          If GetGadgetState(10)
            SetGadgetText(8, "Value")
          Else
            SetGadgetText(8, "Lightness")
          EndIf
        Case 11:
          If IsImage(0)
            If CreateImage(1, ImageWidth(0), ImageHeight(0), 32) And StartDrawing(ImageOutput(1))
              ; make a 32 bit copy of the loaded image
              DrawingMode(#PB_2DDrawing_AllChannels)
              DrawImage(ImageID(0), 0, 0)
              ; get the buffer address
              *PixelBuffer = DrawingBuffer()
              PixelCount = OutputHeight()*DrawingBufferPitch() >> 2
              ; set the transform matrix
              If GetGadgetState(10)
                If DrawingBufferPixelFormat() = #PB_PixelFormat_32Bits_RGB
                  SetTransformHSV(@m, GetGadgetState(5), GetGadgetState(7), GetGadgetState(9), 0)
                Else
                  SetTransformHSV(@m, GetGadgetState(5), GetGadgetState(7), GetGadgetState(9), 1)
                EndIf
              Else
                If DrawingBufferPixelFormat() = #PB_PixelFormat_32Bits_RGB
                  SetTransformHSL(@m, GetGadgetState(5), GetGadgetState(7), GetGadgetState(9), 0)
                Else
                  SetTransformHSL(@m, GetGadgetState(5), GetGadgetState(7), GetGadgetState(9), 1)
                EndIf
              EndIf
              ; apply the transform matrix
              t1 = ElapsedMilliseconds()
              ApplyTransform(@m, *PixelBuffer, *PixelBuffer, PixelCount)              
              t2 = ElapsedMilliseconds()
              StopDrawing()
              SetGadgetState(3, ImageID(1))
              SetGadgetAttribute(2, #PB_ScrollArea_InnerWidth, ImageWidth(1))
              SetGadgetAttribute(2, #PB_ScrollArea_InnerHeight, ImageHeight(1))
              StatusBarText(0, 0, "Processed "+Str(PixelCount)+" pixels in "+Str(t2-t1)+" ms")
            EndIf
          EndIf
        Case 12:
          File.s = OpenFileRequester("Select image file", "", "Image file | *.png;*.jpg;*.jpeg", 0)
          If File And LoadImage(0, File)
            SetGadgetState(1, ImageID(0))
            SetGadgetAttribute(0, #PB_ScrollArea_InnerWidth, ImageWidth(0))
            SetGadgetAttribute(0, #PB_ScrollArea_InnerHeight, ImageHeight(0))
          EndIf
      EndSelect
    EndIf
  Until Event = #PB_Event_CloseWindow
  
EndIf
Windows (x64)
Raspberry Pi OS (Arm64)
SMaag
Enthusiast
Enthusiast
Posts: 352
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: HSL/HSV to RGB

Post by SMaag »

There must be somthing wrong in all of our codes!

A Hue shift from red +120° = green +120° = blue
and hue shift of 180° is:
from red to cyan RGB(0,255,255)
from green to magenta RGB(255,0,255)
from blue to yellow RGB(255,255,0)


This isn't the case! But the code from wilbert is better in the result than mine!

I guess it is a problem of the matrix parameters!
User avatar
Piero
Addict
Addict
Posts: 1163
Joined: Sat Apr 29, 2023 6:04 pm
Location: Italy

Re: HSL/HSV to RGB

Post by Piero »

To me wilbert's code gives assembler error:

error: use of undeclared identifier 'mov'
mov ecx, [p.v_NumPixels]
^
purebasic.c:858:28: error: use of undeclared identifier 'load'
movd xmm1, [eax + ecx*4] ; load pixel
^
purebasic.c:859:30: error: use of undeclared identifier '

Must be cause I'm on (Mac) M1 :(
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3944
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: HSL/HSV to RGB

Post by wilbert »

SMaag wrote: Wed Sep 13, 2023 12:13 pmI guess it is a problem of the matrix parameters!
I think it has to do with the YIQ colorspace.
A quote from the page where the matrix multiplication came from ...
RGB values aren’t very convenient for doing complex transforms on, especially hue. The math for doing a hue rotation on RGB is nasty. However, the math for doing a hue rotation on YIQ is very easy; YIQ is a color space which uses the perceptive-weighted brightness of the red, green and blue channels to provide a luminance (Y) channel, and places the chroma values for red, green and blue roughly 120 degrees apart in the I-Q plane.

Note that there are many color spaces that you can use for this transform which have different hue-mapping characteristics; strictly-speaking, there is no single natural “angle” between any given colors, and different effects can be achieved by using different color spaces such as YPbPr or YUV.
You can ask yourself if it is okay that the result of a 180 degree shift from blue(0,0,255) would output yellow as (255,255,0).
In that case, the luminance is totally disrespected since the yellow output is much brighter as the blue input.
While the conversion to and from YIQ colorspace does produce different results from what you might expect, it does do a better job in respecting the luminance of the source image when adjusting the hue (at least to my eyes). 8)

Piero wrote: Wed Sep 13, 2023 1:12 pm To me wilbert's code gives assembler error:
You are right, it is because you are using an Arm based processor. The assembly code is for x86/x64 only.
To do something similar for Arm based processors like the M1, you can write assembly code with NEON instructions but unfortunately I'm not familiar enough with it at the moment to write such code. :(
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply