[Solved] Fast Alpha Blending -percent based needed

Bare metal programming in PureBasic, for experienced users
walbus
Addict
Addict
Posts: 929
Joined: Sat Mar 02, 2013 9:17 am

[Solved] Fast Alpha Blending -percent based needed

Post by walbus »

Hi,
as a last step for quick output very large animated GIF frames directly on canvas (As sample ORBO GIF) its helpfull for speed up
a PB code based function a little
So i would ask for help converting this or a similar function to ASM

Code: Select all

Procedure Color_Mix(color1.l, color2.l, percent.l)
  r= ((Red(color1)*percent)/100) + ((Red(color2)*(100-percent)) / 100)
  g= ((Green(color1)*percent)/100) + ((Green(color2)*(100-percent)) / 100)
  b= ((Blue(color1)*percent)/100) + ((Blue(color2)*(100-percent)) / 100)
  ProcedureReturn RGB(r,g,b)
EndProcedure
Last edited by walbus on Sun May 21, 2017 6:17 pm, edited 1 time in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Fast Alpha Blending -percent based needed

Post by wilbert »

This thread contains a mix procedure
http://www.purebasic.fr/english/viewtop ... 35&t=66220
It is not percent based but requires a value from 0 - 255.
Windows (x64)
Raspberry Pi OS (Arm64)
walbus
Addict
Addict
Posts: 929
Joined: Sat Mar 02, 2013 9:17 am

Re: Fast Alpha Blending -percent based needed

Post by walbus »

Many thanks Wilbert, this is what i want !
I have many pleasure with the ORBO Gif, looking here
https://www.reddit.com/r/orbo/
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Fast Alpha Blending -percent based needed

Post by wilbert »

walbus wrote:I have many pleasure with the ORBO Gif, looking here
https://www.reddit.com/r/orbo/
Nice quality images :)
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: [Solved] Fast Alpha Blending -percent based needed

Post by netmaestro »

It's not easy to beat wilbert with a solution and nearly impossible to beat the speed of his code. But I worked on this dammit so you are getting it. On my machine it runs more than 2x as fast as the posted PB procedure (Remember to turn the debugger off) Also, bear in mind it's x86 only.

Code: Select all

Procedure Color_Mix_asm(color1.l, color2.l, percent.l)
  ; netmaestro May 2017
  
  Protected.b r, g, b
  
  ; r = Red(color1)*percent / 100
  !mov eax, [p.v_color1]
  !and eax, 0xFF
  !imul eax, [p.v_percent]
  !cdq
  !mov ebx, 0x64
  !idiv ebx
  !mov [p.v_r], al
  ; r + Red(color2)*(100-percent) / 100
  !mov eax, [p.v_color2]
  !and eax, 0xFF
  !mov ebx, 0x64
  !sub ebx, [p.v_percent]
  !imul eax, ebx
  !cdq
  !mov ebx, 0x64
  !idiv ebx
  !add [p.v_r], al
  
 ; g = Green(color1)*percent / 100
  !mov eax, [p.v_color1]
  !shr eax, 8
  !and eax, 0xFF
  !imul eax, [p.v_percent]
  !cdq
  !mov ebx, 0x64
  !idiv ebx
  !mov [p.v_g], al
  ; g + Green(color2)*(100-percent) / 100
  !mov eax, [p.v_color2]
  !shr eax, 8
  !and eax, 0xFF
  !mov ebx, 0x64
  !sub ebx, [p.v_percent]
  !imul eax, ebx
  !cdq
  !mov ebx, 0x64
  !idiv ebx
  !add [p.v_g], al
    
  ; b = Blue(color1)*percent / 100
  !mov eax, [p.v_color1]
  !shr eax, 16
  !and eax, 0xFF
  !imul eax, [p.v_percent]
  !cdq
  !mov ebx, 0x64
  !idiv ebx
  !mov [p.v_b], al
  ; b + Blue(color2)*(100-percent) / 100
  !mov eax, [p.v_color2]
  !shr eax, 16
  !and eax, 0xFF
  !mov ebx, 0x64
  !sub ebx, [p.v_percent]
  !imul eax, ebx
  !cdq
  !mov ebx, 0x64
  !idiv ebx
  !add [p.v_b], al
  
  ; ProcedureReturn RGB(r, g, b)
  !xor eax, eax
  !mov al, [p.v_b]
  !shl eax, 16
  !mov ah, [p.v_g]
  !mov al, [p.v_r]
  
  ProcedureReturn 
  
EndProcedure

Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
  r= ((Red(color1)*percent)/100) + ((Red(color2)*(100-percent)) / 100)
  g= ((Green(color1)*percent)/100) + ((Green(color2)*(100-percent)) / 100)
  b= ((Blue(color1)*percent)/100) + ((Blue(color2)*(100-percent)) / 100)
  ProcedureReturn RGB(r,g,b)
EndProcedure

; Debug RSet(Hex(Color_Mix_pb(#White, #Blue, 36),#PB_Long), 6, "0")
; Debug RSet(Hex(Color_Mix_asm(#White, #Blue, 36),#PB_Long), 6, "0")
; 
; End

CompilerIf #PB_Compiler_Debugger
  MessageRequester("Notice:", "Please turn off the debugger for this test")
  End
CompilerEndIf

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_pb(#Green,#Blue, 50)
Next
MessageRequester("PB Code Version", Str(ElapsedMilliseconds()-s))

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_asm(#Green,#Blue, 50)
Next
MessageRequester("asm Version", Str(ElapsedMilliseconds()-s))

Last edited by netmaestro on Mon May 22, 2017 7:59 pm, edited 1 time in total.
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: [Solved] Fast Alpha Blending -percent based needed

Post by wilbert »

netmaestro wrote:I worked on this dammit so you are getting it. On my machine it runs more than 2x as fast as the posted PB procedure (Remember to turn the debugger off) Also, bear in mind it's x86 only.
That's a more literal conversion :)
Any reason why you use esp ?
On OSX, offsets to esp can be different from Windows.
For cross platform compatibility, it's better to use [p.v_color1] instead of [esp+16].

To get the byte value from a color, you could also have used movzx.
It would result in a few lines less code but probably wouldn't make a significant difference when it comes to speed (div has the biggest impact).
Windows (x64)
Raspberry Pi OS (Arm64)
walbus
Addict
Addict
Posts: 929
Joined: Sat Mar 02, 2013 9:17 am

Re: [Solved] Fast Alpha Blending -percent based needed

Post by walbus »

@Wilbert
Now i have add your routine to my shapes engine
With other ASM routines for color distance and invisible color handling, also from you
All works very, very fine !
Again many thanks for your friendly help !
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: [Solved] Fast Alpha Blending -percent based needed

Post by netmaestro »

Ok, good point on the esp, I had forgotten that MacOS uses it differently. I made the change to named vars but i don't know how to implement movzx to streamline the code. Any light you can shed on it would be appreciated.
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: [Solved] Fast Alpha Blending -percent based needed

Post by wilbert »

netmaestro wrote:I made the change to named vars but i don't know how to implement movzx to streamline the code. Any light you can shed on it would be appreciated.
movzx allows you to load a byte (or word) into a 32 bit register. The upper 24 bits are cleared (zx means zero extend).
This way you don't need to use a shift and a mask to get the red, green or blue value.

example:

Code: Select all

  !movzx eax, byte [p.v_color1]; red
  !movzx eax, byte [p.v_color1 + 1]; green
  !movzx eax, byte [p.v_color1 + 2]; blue
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: [Solved] Fast Alpha Blending -percent based needed

Post by netmaestro »

Thanks, I think I got it. Lots of streamlining although as you predicted, not a noticeable improvement in speed, though it does execute in approx. 40% of the time the PureBasic code version takes:

Code: Select all

Procedure Color_Mix_asm(color1.l, color2.l, percent.l)
  ; netmaestro May 2017
  
  Protected result=0
  
  !xor ecx, ecx 
  !@@:
  !movzx eax, byte [p.v_color1 + ecx]
  !imul eax, [p.v_percent]
  !cdq
  !mov ebx, 0x64
  !idiv ebx
  !mov [p.v_result + ecx], al
  !movzx eax, byte [p.v_color2 + ecx]
  !mov ebx, 0x64
  !sub ebx, [p.v_percent]
  !imul eax, ebx
  !cdq
  !mov ebx, 0x64
  !idiv ebx
  !add [p.v_result + ecx], al
  !inc ecx
  !cmp ecx, 0x2
  !jle @b
   
  ProcedureReturn result
  
EndProcedure

Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
  r= ((Red(color1)*percent)/100) + ((Red(color2)*(100-percent)) / 100)
  g= ((Green(color1)*percent)/100) + ((Green(color2)*(100-percent)) / 100)
  b= ((Blue(color1)*percent)/100) + ((Blue(color2)*(100-percent)) / 100)
  ProcedureReturn RGB(r,g,b)
EndProcedure

; Debug RSet(Hex(Color_Mix_pb(#White, #Black, 17),#PB_Long), 6, "0")
; Debug RSet(Hex(Color_Mix_asm(#White, #Black, 17),#PB_Long), 6, "0")
; 
; End

CompilerIf #PB_Compiler_Debugger
  MessageRequester("Notice:", "Please turn off the debugger for this test")
  End
CompilerEndIf

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_pb(#Green,#Blue, 50)
Next
MessageRequester("PB Code Version", Str(ElapsedMilliseconds()-s))

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_asm(#Green,#Blue, 50)
Next
MessageRequester("asm Version", Str(ElapsedMilliseconds()-s))
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: [Solved] Fast Alpha Blending -percent based needed

Post by wilbert »

netmaestro wrote:Thanks, I think I got it. Lots of streamlining although as you predicted, not a noticeable improvement in speed, though it does execute in approx. 40% of the time the PureBasic code version takes:
Nice idea, that loop :)
As for speed improvement, probably the only way to get a significant increase is to get rid of the idiv instruction.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: [Solved] Fast Alpha Blending -percent based needed

Post by netmaestro »

As for speed improvement, probably the only way to get a significant increase is to get rid of the idiv instruction.
I found some algorithms on the web for dividing by 100 using only add and shift. I picked one and implemented it in asm for this procedure and it was actually slower. So I tried another shorter one and now it reduces execution time vs. PB code from 40% to 25%:

Code: Select all

Procedure Color_Mix_asm(color1.l, color2.l, percent.l)
  ; netmaestro May 2017
  
  Protected result=0
  
  !xor ecx, ecx 
  !@@:
  !movzx eax, byte [p.v_color1 + ecx]
  !imul eax, [p.v_percent]
  !mov edx, eax 	
  !shr edx, 5	  
  !add edx, eax	
  !shl edx, 2	  
  !add eax, edx	
  !shr eax, 9	  
  !mov [p.v_result + ecx], al
  !movzx eax, byte [p.v_color2 + ecx]
  !mov ebx, 0x64
  !sub ebx, [p.v_percent]
  !imul eax, ebx
  !mov edx, eax 	
  !shr edx, 5	  
  !add edx, eax	
  !shl edx, 2	  
  !add eax, edx	
  !shr eax, 9	  
  !add [p.v_result + ecx], al
  !inc ecx
  !cmp ecx, 0x2
  !jle @b
  
  ProcedureReturn result
  
EndProcedure

Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
  r= ((Red(color1)*percent)/100) + ((Red(color2)*(100-percent)) / 100)
  g= ((Green(color1)*percent)/100) + ((Green(color2)*(100-percent)) / 100)
  b= ((Blue(color1)*percent)/100) + ((Blue(color2)*(100-percent)) / 100)
  ProcedureReturn RGB(r,g,b)
EndProcedure

; Debug RSet(Hex(Color_Mix_pb(#Red, #Green, 80),#PB_Long), 6, "0")
; Debug RSet(Hex(Color_Mix_asm(#Red, #Green, 80),#PB_Long), 6, "0")
; 
; End

CompilerIf #PB_Compiler_Debugger
  MessageRequester("Notice:", "Please turn off the debugger for this test")
  End
CompilerEndIf

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_pb(#Green,#Blue, 50)
Next
e=ElapsedMilliseconds()-s
MessageRequester("PB Code Version", Str(ElapsedMilliseconds()-s))

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_asm(#Green,#Blue, 50)
Next
MessageRequester("asm Version", Str(ElapsedMilliseconds()-s))
There may be a more efficient way to do this but I'm not finding it... Although actually it's executing 10 million times in 124 ms here. That's pretty fast.
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: [Solved] Fast Alpha Blending -percent based needed

Post by wilbert »

netmaestro wrote:There may be a more efficient way to do this but I'm not finding it... Although actually it's executing 10 million times in 124 ms here. That's pretty fast.
Nice division algorithm. :D
Even with the same algorithm you can speed things up.
You can first add the components and then divide (see the adapted PB routine).
It's also not required to use ebx (which officially should be preserved) and to allocate a variable for result.
Last change I made is using a local label instead of an anonymous label because anonymous asm labels aren't supported on OSX (nasm/yasm instead of fasm).

Code: Select all

Procedure Color_Mix_asm(color1.l, color2.l, percent.l)
  ; netmaestro May 2017
  
  !xor ecx, ecx 
  !.loop:
  
  !mov eax, 0x64
  !sub eax, [p.v_percent]
  !movzx edx, byte [p.v_color2 + ecx]
  !imul edx, eax
  !movzx eax, byte [p.v_color1 + ecx]
  !imul eax, [p.v_percent]
  !add eax, edx
  
  !mov edx, eax    
  !shr edx, 5     
  !add edx, eax   
  !shl edx, 2     
  !add eax, edx   
  !shr eax, 9     
  !mov [p.v_color1 + ecx], al
  
  !inc ecx
  !cmp ecx, 0x2
  !jle .loop
  
  !mov eax, [p.v_color1]
  ProcedureReturn
  
EndProcedure

Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
  r= (Red(color1)*percent + Red(color2)*(100-percent)) / 100
  g= (Green(color1)*percent + Green(color2)*(100-percent)) / 100
  b= (Blue(color1)*percent + Blue(color2)*(100-percent)) / 100
  ProcedureReturn RGB(r,g,b)
EndProcedure

; Debug RSet(Hex(Color_Mix_pb(#Red, #Green, 80),#PB_Long), 6, "0")
; Debug RSet(Hex(Color_Mix_asm(#Red, #Green, 80),#PB_Long), 6, "0")
; 
; End

CompilerIf #PB_Compiler_Debugger
  MessageRequester("Notice:", "Please turn off the debugger for this test")
  End
CompilerEndIf

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_pb(#Green,#Blue, 50)
Next
e=ElapsedMilliseconds()-s
MessageRequester("PB Code Version", Str(ElapsedMilliseconds()-s))

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_asm(#Green,#Blue, 50)
Next
MessageRequester("asm Version", Str(ElapsedMilliseconds()-s))
Using a multiply with a shift to divide seems to be a bit faster.
There might be rounding differences between the two approaches; haven't checked for that.

Code: Select all

Procedure Color_Mix_asm(color1.l, color2.l, percent.l)
  ; netmaestro May 2017
  
  !xor ecx, ecx 
  !.loop:
  
  !mov eax, 0x64
  !sub eax, [p.v_percent]
  !movzx edx, byte [p.v_color2 + ecx]
  !imul edx, eax
  !movzx eax, byte [p.v_color1 + ecx]
  !imul eax, [p.v_percent]
  !add eax, edx
  
  !imul eax, 167773
  !shr eax, 24
  !mov [p.v_color1 + ecx], al
  
  !inc ecx
  !cmp ecx, 0x2
  !jle .loop
  
  !mov eax, [p.v_color1]
  ProcedureReturn
  
EndProcedure

Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
  r= (Red(color1)*percent + Red(color2)*(100-percent)) / 100
  g= (Green(color1)*percent + Green(color2)*(100-percent)) / 100
  b= (Blue(color1)*percent + Blue(color2)*(100-percent)) / 100
  ProcedureReturn RGB(r,g,b)
EndProcedure

; Debug RSet(Hex(Color_Mix_pb(#Red, #Green, 80),#PB_Long), 6, "0")
; Debug RSet(Hex(Color_Mix_asm(#Red, #Green, 80),#PB_Long), 6, "0")
; 
; End

CompilerIf #PB_Compiler_Debugger
  MessageRequester("Notice:", "Please turn off the debugger for this test")
  End
CompilerEndIf

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_pb(#Green,#Blue, 50)
Next
e=ElapsedMilliseconds()-s
MessageRequester("PB Code Version", Str(ElapsedMilliseconds()-s))

s=ElapsedMilliseconds()
For i=1 To 10000000
  Color_Mix_asm(#Green,#Blue, 50)
Next
MessageRequester("asm Version", Str(ElapsedMilliseconds()-s))
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: [Solved] Fast Alpha Blending -percent based needed

Post by netmaestro »

Excellent work wilbert. You've consolidated the task nicely and the divide is better yet. It's executing 10m times in 85-90 msec here now, down from 120-124 with my latest. Thanks for the instructive input.
BERESHEIT
User avatar
Demivec
Addict
Addict
Posts: 4086
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: [Solved] Fast Alpha Blending -percent based needed

Post by Demivec »

Just as a side note, the original PureBasic version can be improved further by simplifying the code to:

Code: Select all

Procedure Color_Mix_pb(color1.l, color2.l, percent.l)
  Protected p.f, r, g, b
  p = percent / 100
  r= (Red(color1) - Red(color2)) * p + Red(color2)
  g= (Green(color1) - Green(color2)) * p + Green(color2)
  b= (Blue(color1) - Blue(color2)) * p + Blue(color2)
  ProcedureReturn RGB(r,g,b)
EndProcedure
I had hopes to to improve the assembler version by implementing this same idea there but I didn't see a way to readily do so, though I did try :wink: .

wilbert's implementation incorporates a similar idea with the hoped for speed improvements and thus more than meets the initial goal. Thanks wilbert.

@Edit: corrected the p variable type to be a float. It was correct in the production code, honest. :)
Last edited by Demivec on Wed May 24, 2017 8:26 am, edited 1 time in total.
Post Reply