Code optimisation or translate to assembly?

Just starting out? Need help? Post your questions and find answers here.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Code optimisation or translate to assembly?

Post by wilbert »

Didaktik wrote:I made software for VJ'ing. As footage i use ZX Spectrum demo, games, GIF's etc in native ZX Spectrum screen format.
1 frame video = no packed 6912 byte screen.
That's great :) 8)

I understand now why you needed the additional performance.
You mentioned you are using OpenGL for your graphics.
Maybe the atom based tablet has also less gpu cores to work with. :?
Windows (x64)
Raspberry Pi OS (Arm64)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Code optimisation or translate to assembly?

Post by wilbert »

A few more thoughts ...
- If the Atom supports AVX, that might be a way to increase the speed some more of the module I posted.
- It's also possible to add two additional procedures that output directly to 25 or 50% size if you need smaller images.
- Did you use SSE2 for the mixing also ? If not, you might consider that if you still need more performance.
Windows (x64)
Raspberry Pi OS (Arm64)
Didaktik
User
User
Posts: 79
Joined: Fri Mar 14, 2014 2:12 pm

Re: Code optimisation or translate to assembly?

Post by Didaktik »

wilbert wrote:A few more thoughts ...
- If the Atom supports AVX, that might be a way to increase the speed some more of the module I posted.
- It's also possible to add two additional procedures that output directly to 25 or 50% size if you need smaller images.
- Did you use SSE2 for the mixing also ? If not, you might consider that if you still need more performance.

I do not use optimization Blending. I think it takes so much time. The current version can be converted to tables and instead of 4 cycles to pixel, ink, paper, bright can be left alone. I'm still busy with other things. I rewrite the entire program from scratch. But I think again, did not the right choice. Windows GUI is very clumsy for this. I think the ideal is to write your mini GUI, and an interface to do completely on Open GL or Sprites. What would the whole screen was one big sprite. Because if you do such and paste scrollarea to 500 images, the motion of the block processor takes significantly longer. And others do not convenience.

Here is the current version of blending:

Code: Select all


Procedure BlendLayers(screen_src, screen_dst, pixel, ink, paper, bright)
    
      Select pixel
          
        Case #put:  
          
          For pos=0 To 6143 Step 4
            PokeI(screen_dst+pos, PeekI(screen_src + pos))
          Next pos
          
        Case #or:  
          
          For pos=0 To 6143 Step 4
            PokeI(screen_dst+pos, PeekI(screen_src+pos) | PeekI(screen_dst+pos))
          Next pos
          
        Case #xor:  
          
          For pos=0 To 6143 Step 4
            PokeI(screen_dst+pos, PeekI(screen_src+pos) ! PeekI(screen_dst+pos))
          Next pos
          
        Case #and:  
          
          For pos=0 To 6143 Step 4
            PokeI(screen_dst+pos, PeekI(screen_src+pos) & PeekI(screen_dst+pos))
          Next pos
          
      EndSelect
      
      
        
      Select ink
          
        Case #put: 
          
          For pos=6144 To 6912-1
                        
            PokeA(screen_dst+pos, (PeekA(screen_dst+pos) & #mask_ink_inv) | (PeekA(screen_src + pos) & #mask_ink)  )
            
          Next pos
          
        Case #or:  
          
          For pos=6144 To 6912-1
            
            PokeA(screen_dst+pos, PeekA(screen_dst+pos) | (PeekA(screen_src + pos) & #mask_ink)  )
                        
          Next pos
          
        Case #xor: 
          
          For pos=6144 To 6912-1
                        
            PokeA(screen_dst+pos, PeekA(screen_dst+pos) ! (PeekA(screen_src + pos) & #mask_ink)  )
                                    
          Next pos
          
        Case #and: 
          
          For pos=6144 To 6912-1

            PokeA(screen_dst+pos, PeekA(screen_dst+pos) & (PeekA(screen_src+pos) | #mask_ink_inv) )
            
          Next pos
          
        Case #add: 
          
          For pos=6144 To 6912-1
            
            s.c = PeekA(screen_dst + pos) 
            d.c = PeekA(screen_src + pos) 
            ADD = (s & #mask_ink) + (d & #mask_ink)
            
            If ADD > 7: ADD = 7: EndIf
            
            PokeA(screen_src+pos, a & #mask_ink_inv | ADD )    
            
          Next pos
          
      EndSelect
        
      
      Select paper
          
        Case #put:
          
          For pos=6144 To 6912-1
            PokeA(screen_dst+pos, (PeekA(screen_src + pos) & #mask_paper) | (PeekA(screen_dst + pos) & #mask_paper_inv)  )
          Next pos
          
        Case #or:
             
          For pos=6144 To 6912-1
            PokeA(screen_dst+pos, PeekA(screen_dst+pos) | (PeekA(screen_src+pos) & #mask_paper) )
          Next pos
          
        Case #xor: 
          
          For pos=6144 To 6912-1
            PokeA(screen_dst+pos, PeekA(screen_dst+pos) ! (PeekA(screen_src+pos) & #mask_paper) )
          Next pos
          
        Case #and: 
          
          For pos=6144 To 6912-1
            PokeA(screen_dst+pos, PeekA(screen_dst+pos) & (PeekA(screen_src+pos) | #mask_paper_inv) )
          Next pos
          
        Case #add: 
          
          For pos=6144 To 6912-1
            s = PeekA(screen_dst + pos) 
            d = PeekA(screen_src + pos) 
            ADD = (s>>3 & #mask_paper) + (d>>3 & #mask_paper)
            
            If ADD > 7: ADD = 7: EndIf
            
            PokeA(screen_dst+pos, a & #mask_paper_inv | (ADD << 3) )  
          Next pos
          
      EndSelect
      
        
      Select bright
          
        Case #put:
          
          For pos=6144 To 6912-1
            PokeA(screen_dst+pos, (PeekA(screen_dst + pos) & #mask_bright_inv) | (PeekA(screen_src + pos) & #mask_bright) )
          Next pos
          
        Case #or:  
          
          For pos=6144 To 6912-1
            PokeA(screen_dst+pos, PeekA(screen_dst + pos) | (PeekA(screen_src + pos) & #mask_bright) )
          Next pos
          
        Case #xor: 
          
          For pos=6144 To 6912-1
            PokeA(screen_dst+pos, PeekA(screen_dst + pos) ! (PeekA(screen_src + pos) & #mask_bright) )
          Next pos
          
        Case #and: 
          
          For pos=6144 To 6912-1
            PokeA(screen_dst+pos, PeekA(screen_dst+pos) & ((PeekA(screen_src+pos) | #mask_bright_inv)) )
          Next pos
          
      EndSelect
  
EndProcedure  

Screenshot old version:

Image
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Code optimisation or translate to assembly?

Post by wilbert »

Didaktik wrote:I do not use optimization Blending. I think it takes so much time. The current version can be converted to tables and instead of 4 cycles to pixel, ink, paper, bright can be left alone. I'm still busy with other things. I rewrite the entire program from scratch. But I think again, did not the right choice. Windows GUI is very clumsy for this. I think the ideal is to write your mini GUI, and an interface to do completely on Open GL or Sprites. What would the whole screen was one big sprite. Because if you do such and paste scrollarea to 500 images, the motion of the block processor takes significantly longer. And others do not convenience.
Your screenshot looks great ! Nice interface.

The blend layers could be optimized using SSE2 (especially the pixel blending) but I don't know if it currently takes up much time.
SSE also has an additional operation named AndN which first inverts all destination bits and then does a and.

I can imagine scrolling so much images can be difficult. I don't know if you are storing thumbnails on your harddrive or are generating them in real time.
If you are generating them in real time, it would be best to render directly to 25% size so you don't have to resize anymore.
If you are using Windows components, you could consider one big 'contact sheet' image with all thumbnails.
If a thumbnail is 64x48 pixels and you use an image with a width of 640 pixels and a height of 7200 pixels, you can have 1500 thumbnails (10 cols, 150 rows).
Scrolling a single image might be much easier.

It's a nice project you are working on :)
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply