In this case p has to be a float instead of integerDemivec wrote:Just as a side note, the original PureBasic version can be improved further by simplifying the code to:Code: Select all
Procedure Color_Mix_pb(color1.l, color2.l, percent.l) Protected p, r, g, b p = percent / 100 r= (Red(color1) - Red(color2)) * p + Red(color2) g= (Green(color1) - Green(color2)) * p + Green(color2) b= (Blue(color1) - Blue(color2)) * p + Blue(color2) ProcedureReturn RGB(r,g,b) EndProcedure
[Solved] Fast Alpha Blending -percent based needed
Re: [Solved] Fast Alpha Blending -percent based needed
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: [Solved] Fast Alpha Blending -percent based needed
That was the problem I ran into with my assembler version too . . . until it dawned on me a little ways into testing.wilbert wrote:In this case p has to be a float instead of integerDemivec wrote:Just as a side note, the original PureBasic version can be improved further by simplifying the code to:Code: Select all
Procedure Color_Mix_pb(color1.l, color2.l, percent.l) Protected p, r, g, b p = percent / 100 r= (Red(color1) - Red(color2)) * p + Red(color2) g= (Green(color1) - Green(color2)) * p + Green(color2) b= (Blue(color1) - Blue(color2)) * p + Blue(color2) ProcedureReturn RGB(r,g,b) EndProcedure
Either way I corrected this in the previous post.
Re: [Solved] Fast Alpha Blending -percent based needed
@Demivec, your idea of a delta with a single multiplication works great.
It can be applied to asm code as well.
It can be applied to asm code as well.
Code: Select all
Macro M_Color_Mix(channel)
!movzx eax, byte [p.v_color1 + channel]
!movzx edx, byte [p.v_color2 + channel]
!sub eax, edx
!imul eax, ecx
!add eax, 0x800000
!shr eax, 24
!add eax, edx
!mov [p.v_color1 + channel], al
EndMacro
Procedure Color_Mix_asm(color1.l, color2.l, percent.l)
!mov ecx, [p.v_percent]
!imul ecx, 167772
M_Color_Mix(0)
M_Color_Mix(1)
M_Color_Mix(2)
!mov eax, [p.v_color1]
ProcedureReturn
EndProcedure
Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
r= (Red(color1)*percent + Red(color2)*(100-percent)) / 100
g= (Green(color1)*percent + Green(color2)*(100-percent)) / 100
b= (Blue(color1)*percent + Blue(color2)*(100-percent)) / 100
ProcedureReturn RGB(r,g,b)
EndProcedure
; Debug RSet(Hex(Color_Mix_pb(#Red, #Green, 80),#PB_Long), 6, "0")
; Debug RSet(Hex(Color_Mix_asm(#Red, #Green, 80),#PB_Long), 6, "0")
;
; End
CompilerIf #PB_Compiler_Debugger
MessageRequester("Notice:", "Please turn off the debugger for this test")
End
CompilerEndIf
s=ElapsedMilliseconds()
For i=1 To 10000000
Color_Mix_pb(#Green,#Blue, 50)
Next
e=ElapsedMilliseconds()-s
MessageRequester("PB Code Version", Str(ElapsedMilliseconds()-s))
s=ElapsedMilliseconds()
For i=1 To 10000000
Color_Mix_asm(#Green,#Blue, 50)
Next
MessageRequester("asm Version", Str(ElapsedMilliseconds()-s))
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: [Solved] Fast Alpha Blending -percent based needed
Interesting new variants
Here a little changed fast macro from user eesau, but not with percent
Here a little changed fast macro from user eesau, but not with percent
Code: Select all
Macro RGB(red, green, blue) : (((blue<<8+green)<<8)+red) : EndMacro ; Macro by eesau
Macro Red(color) : (color&$FFFFFF>>16) : EndMacro
Macro Green(color) : (color&$FFFF)>>8 : EndMacro
Macro Blue(color) : (color>>16) : EndMacro
Macro AlphaBlend(color_1, color_2, alpha)
RGB(((Red(color_2)*alpha+Red(color_1)*(256-alpha))>>8),
((Green(color_2)*alpha+Green(color_1)*(256-alpha))>>8),
((Blue(color_2)*alpha+Blue(color_1)*(256-alpha))>>8))
EndMacro
Re: [Solved] Fast Alpha Blending -percent based needed
For comparison also a SSE2 version which takes a percent value from 0 - 100.
Code: Select all
Macro M_Color_Mix(channel)
!movzx eax, byte [p.v_color1 + channel]
!movzx edx, byte [p.v_color2 + channel]
!sub eax, edx
!imul eax, ecx
!add eax, 0x800000
!shr eax, 24
!add eax, edx
!mov [p.v_color1 + channel], al
EndMacro
Procedure Color_Mix_asm(color1.l, color2.l, percent.l)
!mov ecx, [p.v_percent]
!imul ecx, 167772
M_Color_Mix(0)
M_Color_Mix(1)
M_Color_Mix(2)
!mov eax, [p.v_color1]
ProcedureReturn
EndProcedure
Procedure Color_Mix_SSE2(color1.l, color2.l, percent.l)
!mov eax, [p.v_percent]
!imul eax, 167772
!shr eax, 8
!movd xmm0, [p.v_color1]
!movd xmm1, [p.v_color2]
!movd xmm2, eax
!punpcklbw xmm0, xmm0
!punpcklbw xmm1, xmm1
!pshuflw xmm2, xmm2, 0
!pcmpeqw xmm3, xmm3
!pxor xmm3, xmm2
!pmulhuw xmm0, xmm2
!pmulhuw xmm1, xmm3
!paddw xmm0, xmm1
!psrlw xmm0, 8
!packuswb xmm0, xmm0
!movd eax, xmm0
ProcedureReturn
EndProcedure
Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
r= (Red(color1)*percent + Red(color2)*(100-percent)) / 100
g= (Green(color1)*percent + Green(color2)*(100-percent)) / 100
b= (Blue(color1)*percent + Blue(color2)*(100-percent)) / 100
ProcedureReturn RGB(r,g,b)
EndProcedure
; Debug RSet(Hex(Color_Mix_pb(#Red, #Green, 80),#PB_Long), 6, "0")
; Debug RSet(Hex(Color_Mix_asm(#Red, #Green, 80),#PB_Long), 6, "0")
; Debug RSet(Hex(Color_Mix_SSE2(#Red, #Green, 80),#PB_Long), 6, "0")
;
; End
CompilerIf #PB_Compiler_Debugger
MessageRequester("Notice:", "Please turn off the debugger for this test")
End
CompilerEndIf
s=ElapsedMilliseconds()
For i=1 To 10000000
Color_Mix_pb(#Green,#Blue, 50)
Next
e=ElapsedMilliseconds()-s
MessageRequester("PB Code Version", Str(ElapsedMilliseconds()-s))
s=ElapsedMilliseconds()
For i=1 To 10000000
Color_Mix_asm(#Green,#Blue, 50)
Next
MessageRequester("asm Version", Str(ElapsedMilliseconds()-s))
s=ElapsedMilliseconds()
For i=1 To 10000000
Color_Mix_SSE2(#Green,#Blue, 50)
Next
MessageRequester("SSE2 Version", Str(ElapsedMilliseconds()-s))
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: [Solved] Fast Alpha Blending -percent based needed
Wow, nice new codes !
A other
This macro above is very fast, i use it for output shapes on canvas
A other
This macro above is very fast, i use it for output shapes on canvas
Code: Select all
Macro RGB(red, green, blue) : (((blue<<8+green)<<8)+red) : EndMacro ; Macro by eesau
Macro Red(color) : (color&$FFFFFF>>16) : EndMacro
Macro Green(color) : (color&$FFFF)>>8 : EndMacro
Macro Blue(color) : (color>>16) : EndMacro
Macro AlphaBlend(color_1, color_2, alpha)
RGB(((Red(color_2)*alpha+Red(color_1)*(256-alpha))>>8),
((Green(color_2)*alpha+Green(color_1)*(256-alpha))>>8),
((Blue(color_2)*alpha+Blue(color_1)*(256-alpha))>>8))
EndMacro
Procedure AlphaBlend_(color_1, color_2, mix); mix [0, 255] ; By wilbert
!movd xmm0, [p.v_color_1]
!movd xmm1, [p.v_color_2]
!movd xmm2, [p.v_mix]
!punpcklbw xmm0, xmm0
!punpcklbw xmm1, xmm1
!punpcklbw xmm2, xmm2
!pcmpeqw xmm3, xmm3
!pshuflw xmm2, xmm2, 0
!pxor xmm3, xmm2
!pmulhuw xmm1, xmm2
!pmulhuw xmm0, xmm3
!paddw xmm0, xmm1
!psrlw xmm0, 8
!packuswb xmm0, xmm0
!movd eax, xmm0
ProcedureReturn
EndProcedure
s=ElapsedMilliseconds()
For i=1 To 10000000
x= AlphaBlend($AAAA,#Blue, 50)
Next
Debug Hex(x)
e=ElapsedMilliseconds()-s
MessageRequester("PB Code Version", Str(ElapsedMilliseconds()-s))
s=ElapsedMilliseconds()
For i=1 To 10000000
x= AlphaBlend_($AAAA,#Blue, 50)
Next
Debug Hex(x)
MessageRequester("asm Version", Str(ElapsedMilliseconds()-s))
Re: [Solved] Fast Alpha Blending -percent based needed
It sure is.walbus wrote:This macro above is very fast, i use it for output shapes on canvas
Here's a variation on your macro but it doesn't seem to make a big difference.
Code: Select all
Macro AlphaBlend(color_1, color_2, alpha)
((((color_2 & $FF00FF)*alpha+(color_1 & $FF00FF)*(256-alpha))>>8 & $FF00FF) |
(((color_2 & $FF00)*alpha+(color_1 & $FF00)*(256-alpha))>>8 & $FF00))
EndMacro
I'm sure if it would be a procedure to blend an array of pixels, asm would be faster.
Here's also a macro for blending all four RGBA channels instead of only RGB.
Code: Select all
Macro AlphaBlendRGBA(color_1, color_2, alpha)
((((color_2 & $FF00FF)*alpha+(color_1 & $FF00FF)*(256-alpha))>>8 & $FF00FF) |
(((color_2 >>8 & $FF00FF)*alpha+(color_1 >> 8 & $FF00FF)*(256-alpha)) & $FF00FF00))
EndMacro
Last edited by wilbert on Sat May 27, 2017 6:42 am, edited 1 time in total.
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: [Solved] Fast Alpha Blending -percent based needed
Hi WIlbert, thanks again, your macro looks better and is many better readable
Yes, but am surprised that it makes so much
Both macros needs only ~14ms on me older i7
regards werner
Yes, but am surprised that it makes so much
Both macros needs only ~14ms on me older i7
regards werner
- Michael Vogel
- Addict
- Posts: 2680
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
Re: [Solved] Fast Alpha Blending -percent based needed
I have done a procedure some years ago which is as fast as the procedure 'Color_Mic_asm' (at least when scaling from 0 to 255, because 'n' must not to be rescaled then)...
Code: Select all
#loop=10000000
a=#Green
b=#Blue
z=50
Macro M_Color_Mix(channel)
!movzx eax, byte [p.v_color1 + channel]
!movzx edx, byte [p.v_color2 + channel]
!sub eax, edx
!imul eax, ecx
!add eax, 0x800000
!shr eax, 24
!add eax, edx
!mov [p.v_color1 + channel], al
EndMacro
Procedure Color_Mix_asm(color1.l, color2.l, percent.l)
!mov ecx, [p.v_percent]
!imul ecx, 167772
M_Color_Mix(0)
M_Color_Mix(1)
M_Color_Mix(2)
!mov eax, [p.v_color1]
ProcedureReturn
EndProcedure
Procedure Color_Mix_SSE2(color1.l, color2.l, percent.l)
!mov eax, [p.v_percent]
!imul eax, 167772
!shr eax, 8
!movd xmm0, [p.v_color1]
!movd xmm1, [p.v_color2]
!movd xmm2, eax
!punpcklbw xmm0, xmm0
!punpcklbw xmm1, xmm1
!pshuflw xmm2, xmm2, 0
!pcmpeqw xmm3, xmm3
!pxor xmm3, xmm2
!pmulhuw xmm0, xmm2
!pmulhuw xmm1, xmm3
!paddw xmm0, xmm1
!psrlw xmm0, 8
!packuswb xmm0, xmm0
!movd eax, xmm0
ProcedureReturn
EndProcedure
Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
r= (Red(color1)*percent + Red(color2)*(100-percent)) / 100
g= (Green(color1)*percent + Green(color2)*(100-percent)) / 100
b= (Blue(color1)*percent + Blue(color2)*(100-percent)) / 100
ProcedureReturn RGB(r,g,b)
EndProcedure
Procedure.i ColorScale(ColA,ColB,n)
;n*255
;n/100
ProcedureReturn ( ((ColA&$FF)*n+(ColB&$FF)*(255-n))>>8 ) | ( (((ColA&$FF00)*n+(ColB&$FF00)*(255-n))>>8)&$FF00 ) | (((ColA>>8&$FF00)*n+(ColB>>8&$FF00)*(255-n)) & $FF0000)
EndProcedure
CompilerIf #PB_Compiler_Debugger
MessageRequester("Notice:", "Please turn off the debugger for this test")
End
CompilerEndIf
m.s=""
s=ElapsedMilliseconds()
For i=1 To #loop
r=Color_Mix_pb(a,b,z)
Next
e=ElapsedMilliseconds()-s
m+"PB"+#TAB$+Str(ElapsedMilliseconds()-s)+#TAB$+Hex(r)+#CR$
s=ElapsedMilliseconds()
For i=1 To #loop
r=Color_Mix_asm(a,b,z)
Next
m+"ASM"+#TAB$+Str(ElapsedMilliseconds()-s)+#TAB$+Hex(r)+#CR$
s=ElapsedMilliseconds()
For i=1 To #loop
r=Color_Mix_SSE2(a,b,z)
Next
m+"SSE2"+#TAB$+Str(ElapsedMilliseconds()-s)+#TAB$+Hex(r)+#CR$
s=ElapsedMilliseconds()
z_=z*255/100
For i=1 To #loop
r=ColorScale(a,b,z*255/100); speeds up by using r=ColorScale(a,b,z_)
Next
m+"BIRD"+#TAB$+Str(ElapsedMilliseconds()-s)+#TAB$+Hex(r)
MessageRequester(": )",m)
Re: [Solved] Fast Alpha Blending -percent based needed
The best (fastest) approach depends a lot on what you exactly want to do.Michael Vogel wrote:I have done a procedure some years ago which is as fast as the procedure 'Color_Mic_asm' (at least when scaling from 0 to 255, because 'n' must not to be rescaled then)...
Blending an array of pixels is a different problem compared to only two color values.
The range of the blend value (0 - 100), (0 - 255), (0 - 256) or (0.0 - 1.0) also makes a big difference.
If you need to do a lot of iterations with the same blend value it's indeed wise to pre-calculate like you are doing in your benchmark
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: [Solved] Fast Alpha Blending -percent based needed
Pre calculating is a "must do" how ever it is available
On a real picture the best is ever pre compare color1 with color2
It color1=color2 a function call is not needed
This is also important for color distance
On a real picture the best is ever pre compare color1 with color2
It color1=color2 a function call is not needed
This is also important for color distance
Last edited by walbus on Thu May 25, 2017 6:05 pm, edited 1 time in total.
- Michael Vogel
- Addict
- Posts: 2680
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
Re: [Solved] Fast Alpha Blending -percent based needed
If would do precalculation, I would define a 256x256 matrix which costs 64Kb, but allows to need only some shift commands - but you're right, everything depends on the needs...wilbert wrote:If you need to do a lot of iterations with the same blend value it's indeed wise to pre-calculate like you are doing in your benchmark
About my *255/100 line in the benchmark, I find it interesting, that the following variants results in different timing:
The slowest:
Code: Select all
Procedure.i ColorScale(ColA,ColB,n)
n*255
n/100
ProcedureReturn ...
EndProcedure
For i=1 To #loop
r=ColorScale(a,b,z)
Next
Code: Select all
Procedure.i ColorScale(ColA,ColB,n)
n=n*255/100
ProcedureReturn ...
EndProcedure
Code: Select all
Procedure.i ColorScale(ColA,ColB,n)
ProcedureReturn ...
EndProcedure
For i=1 To #loop
r=ColorScale(a,b,z*255/100)
Next
Code: Select all
For i=1 To #loop
r=ColorScale(a,b,z*2.55)
Next
Re: [Solved] Fast Alpha Blending -percent based needed
Interesting idea, a 256x256 matrix with lookup. I wonder if in practice it would be faster or not.Michael Vogel wrote:If would do precalculation, I would define a 256x256 matrix which costs 64Kb, but allows to need only some shift commands - but you're right, everything depends on the needs...
About my *255/100 line in the benchmark, I find it interesting, that the following variants results in different timing:
Multiplication is very fast these days and with a lookup you have more memory access.
As for your *255/100, division is an operation which takes a lot of time. When you multiply by 2.55 you have no division
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: [Solved] Fast Alpha Blending -percent based needed
For Next is the slowest from all loops, for speed optimizing unsuitable
Last edited by walbus on Thu May 25, 2017 8:41 pm, edited 3 times in total.
- Michael Vogel
- Addict
- Posts: 2680
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
Re: [Solved] Fast Alpha Blending -percent based needed
yep, but I thought a floating point would slowing it down. More surprising is, why 'n*255/100' is faster than 'n*255 : n/100' and why the same formula is done quicker outside the procedure than inside?!wilbert wrote:As for your *255/100, division is an operation which takes a lot of time. When you multiply by 2.55 you have no division