Page 1 of 2

help with asm

Posted: Tue Aug 26, 2003 1:12 am
by coma
I have a simple program :

Code: Select all

InitSprite() : InitKeyboard()
OpenScreen(800,600, 32,"test")
CreateSprite(0,800,600,0)

;Draw some stuff on sprite(0)
StartDrawing(SpriteOutput(0))
  For i=0 To 50
    Box (Random(800),Random(600),Random(300),Random(300),Random($ffffff))
  Next
StopDrawing ()

Repeat
  ExamineKeyboard ()
  DisplaySprite (0,-x,0)
  x+1
  FlipBuffers() 
Until KeyboardPushed(#PB_Key_Escape)
End

ok, good news, this is perfectly smooth on my pc (not a powerfull pc) :D



Now, If I use CreateSprite(0,800,600,#PB_Sprite_Memory) instead of CreateSprite(0,800,600,0) it's slower, but it's normal, because copy from vram to vram is faster than from ram to vram.

But when I try tu use asm to copy the sprite from vram to the screen, it's VERY slooooooooooooow :

Code: Select all

InitSprite() : InitKeyboard()
OpenScreen(800,600, 32,"test")
CreateSprite(0,800,600,0)

StartDrawing(SpriteOutput(0))
  For i=0 To 50
    Box (Random(800),Random(600),Random(300),Random(300),Random($ffffff))
  Next
StopDrawing ()

Repeat
  ExamineKeyboard ()

  StartDrawing(SpriteOutput(0))
  StopDrawing ()
  *source.l = DrawingBuffer()+4*x
  StartDrawing(ScreenOutput())
  StopDrawing ()
  *dest.l= DrawingBuffer()

  MOV esi,*source
  MOV edi,*dest
  MOV ecx,800*600
  REP MOVSD

  x+1
  FlipBuffers() 
Until KeyboardPushed(#PB_Key_Escape)
End

Why this mem copy using asm is so slow ("relatively" slow) ?
"Did I do something wrong" ?
have I to "unlock" the screen, to accelerate the copy ?
(similar to the UnlockBuffer command on Blitz, for high speed pixel operations)

Posted: Tue Aug 26, 2003 9:37 am
by Motu23
The second code works for you ? I just have a lot of stupid lines on the screen...

anyway you use two Startdrawing Stopdrawing Operations in the second example, this is a Timeeater by itself.

But i guess the more importent fakt ist that on the one Hand 2DDrawing Lib is allways not the fastes
And on thes other Hand ESI and EDI are processor registers. So you move every pixel from the videomemory to the CPU and from the CPU back to the videomemory. Thats not a good one...
It's getting a lot faster if you put the sprite in the RAM insteed of the videomemory (CreateSprite(0,800,600,#PB_Sprite_Memory)).
May be this helps you

Posted: Tue Aug 26, 2003 10:58 am
by coma
thanks for your answer, motu23.

But when I copy from ram to vram, it's not smooth (and it's worse when I copy from vram to vram).

So, the only solution I have to make smooth scrolling is to use the
DisplaySprite () command (with the sprite in video ram).

The problem is that sprite command have some limitations (I can't copy the sprite only on a half part of the screen for exemple (a clip), that I need to program a "split screen mode").
It's not possible to make smooth scrollings using mem copy ? (I have a celeron 566 + rivaTNT)

Posted: Tue Aug 26, 2003 11:40 am
by Motu23
try this piece of code:

InitSprite()
InitKeyboard()
OpenScreen(800,600,16,"Test Clip")

CreateSprite(0,64,64,0)
StartDrawing(SpriteOutput(0))
Box(0,0,64,64,255+255*256)
Line(0,0,64,64,255)
StopDrawing()

Dim Sprite_Width.l(100) ; Spritewidht will return cliped lenght !
Dim Sprite_Height.l(100) ; So I use an array instead of the function.

Sprite_Width.l(0) = SpriteWidth(0)
Sprite_Height.l(0) = SpriteHeight(0)

; BTW: This way is 4 times faster than calling Spritewidht(Nr) when you
; need It

Procedure Display_TransparentSprite(Sprite_Nr.l,PosX.l,PosY.l, MinX.l,MinY.l,MaxX.l,MaxY.l)
LongX.l = Sprite_Width(Sprite_Nr.l)
LongY.l = Sprite_Height(Sprite_Nr.l)
Clip_X1 = 0
Clip_Y1 = 0
Clip_X2 = LongX.l
Clip_Y2 = LongY.l
If PosX.l < MinX.l
Clip_X1.l = MinX.l - PosX.l
PosX.l + Clip_X1.l
ElseIf PosX.l + LongX.l > MaxX
Clip_X2.l = LongX.l - ((PosX.l + LongX.l)-MaxX)
EndIf

If PosY.l < MinY.l
Clip_Y1.l = MinY.l - PosY.l
PosY.l + Clip_Y1.l
ElseIf PosY.l + LongY.l > MaxY
Clip_Y2.l = LongY.l - ((PosY.l + LongY.l)-MaxY)
EndIf
ClipSprite(Sprite_Nr.l,Clip_X1.l,Clip_Y1.l,Clip_X2.l,Clip_Y2.l)
DisplayTransparentSprite(Sprite_Nr.l,PosX.l,PosY.l)
EndProcedure


HeroX = 200
HeroY = 200

Repeat

ExamineKeyboard()
If KeyboardPushed(200): HeroY - 1: EndIf
If KeyboardPushed(203): HeroX - 1: EndIf
If KeyboardPushed(205): HeroX + 1: EndIf
If KeyboardPushed(208): HeroY + 1: EndIf

ClearScreen(0,0,0)
StartDrawing(ScreenOutput())
Line(100,100,300,0,255)
Line(100,100,0,200,255)
Line(400,300,0,-200,255)
Line(400,300,-300,0,255)
StopDrawing()
Display_TransparentSprite(0,HeroX,HeroY,100,100,400,300)
FlipBuffers()

Until KeyboardReleased(1)

Posted: Tue Aug 26, 2003 12:17 pm
by coma
ok motu, thanks, it's a good solution for split screen mode :)

another solution is to use little virtual buffers, and then copy them to screen.

Posted: Tue Aug 26, 2003 12:24 pm
by Motu23
I haven't tested this, but i guess extra buffers are a little bit slower as you have to do all drawing operation twice. They cost extra memory as well...
You must select for yourself.
Good Luck with your Splitscreen. :-)

Posted: Tue Aug 26, 2003 12:45 pm
by spangly
The reason your screen looks like garbage is because on a lot of graphics cards (my GF4 included) the display memory isn't 100% linear. Each row of pixels is aligned to a certain memory boundry. At the end of each row is a section of memory that isn't displayed. Use the function DrawingBufferPitch to find the real length.

For instance, a row of 800 pixels uses 3200 bytes in 32 bit mode, but the DrawingBufferPitch() function tells me my card actually uses 3258 bytes so you have to copy 3200 bytes then skip 58 to get to the next row.

Posted: Tue Aug 26, 2003 12:49 pm
by Motu23
IHH.. That's ...

Posted: Tue Aug 26, 2003 3:03 pm
by coma
but I haven't my answer concerning asm :) => why it's so slow when I try to directly access to the screen ?
Just for a clearscreen (supposed to be very fast), it's slower than when I use the box function.

here, I use the box() function to clear the screen :

Code: Select all

InitSprite() : InitKeyboard()
OpenScreen(800,600, 32,"test") 

CreateSprite(0,64,64,0) 
StartDrawing (SpriteOutput(0))
  Box (0,0,64,64,RGB(255,0,0))
StopDrawing()

Repeat 
  ExamineKeyboard () 
  
  StartDrawing (ScreenOutput())
    Box (0,0,800,600,RGB(0,0,128))  ; clearscreen
  StopDrawing()

  DisplaySprite (0,x,200)
  x+1
  FlipBuffers() 
Until KeyboardPushed(#PB_Key_Escape) 
End 
now, same program using asm to clear the screen :

Code: Select all

InitSprite() : InitKeyboard()
OpenScreen(800,600, 32,"test") 

CreateSprite(0,64,64,0) 
StartDrawing (SpriteOutput(0))
  Box (0,0,64,64,RGB(255,0,0))
StopDrawing()

Repeat 
  ExamineKeyboard () 

  StartDrawing(ScreenOutput()) 
  StopDrawing () 
  *dest.l= DrawingBuffer() 
  MOV eax,$5000 
  MOV edi,*dest 
  MOV ecx,800*600
  REP STOSD 

  DisplaySprite (0,x,200)
  x+1
  FlipBuffers() 
Until KeyboardPushed(#PB_Key_Escape) 
End 
this second program is not smooth, why ?

Posted: Tue Aug 26, 2003 4:17 pm
by Motu23
How about this:

"ESI and EDI are processor registers. So you move every pixel from the videomemory to the CPU and from the CPU back to the videomemory. Thats not a good one... " Motu23


...

Posted: Tue Aug 26, 2003 4:23 pm
by coma
ok motu, but in my last exemple, I don't copy from a buffer to another, I just put a value (eax) on screen......

Posted: Tue Aug 26, 2003 6:05 pm
by Fred
It could have severalreasons why the asm code isn't fast:

1) The video bus restriction. Writing from systemmem to videomem can be very slow, especially on old hardware

2) Instructions problems. Using REP STOSD combo seems slow to me. I would write a method which unroll the loop and use DEC/CMP/JZ which are faster instructions and you will benefit from CPU pipeline and multi scalar structure (don't forget to align the whole thing)

Good luck :)

Posted: Tue Aug 26, 2003 6:34 pm
by coma
I have tested this :

Code: Select all

  MOV eax,$5000 
  MOV edi,*dest 
  MOV ecx,800*600
  !Looop:
    MOV [edi],eax
    ADD edi,4
    DEC ecx
  !jnz Looop    
Nothing change.

I don't understand when you say that writing to video memory can be very slow : Your clearscreen() and box() functions are much faster than my asm code. How can you do this ?
is there nothing which speed down the copy ? A sort of "screenlock protection" that I can disable ?

Posted: Tue Aug 26, 2003 6:59 pm
by freak
Do you have the Debugger enabled?
Doesn't the debugger slow down InlineASM code very much?

Just a guess...

Timo

Direct Copy Instead of Using Video Card Drivers

Posted: Tue Aug 26, 2003 7:28 pm
by oldefoxx
The non-linear explanation above is the correct one. As it happens, most
advanced video cards, in addition to having screen memory layed out in a nonlinear fashion, have hardware accelerators that allow faster updates by direct video RAM writes using DMA access. They do this in part by using their own internal registers, and with several pages of memory, they can rapidly switch whole screen buffer areas between front and back buffers with just a single register change. Your direct write effort is making this a CPU-intensive effort, instead of offloading the task to the Video Card. As you said, your PC is not particularly fast, and moving 800*600 doublewords. which is almost 2 Megabytes, is not going to be done in a flash. I also note that you need to move 4 bytes less than you indicate, since the move is not a direct overlay, as you do not want to draw in 4 random bytes from outside the buffer area at the tail end of your move.

But as several have noted above, they are more likely to just see horizontal tear lines because not only is the memory used in a non-linear fashion, it's order and arrangement is also impacted by the screen mode, screen size(640x480, 800x600, etc.), and color depth. In other words, this is not a method of screen updates that can be used by everyone, or even yourself if you select a different screen setup. In DOS, it was a major pain to try and figure out how to work with every screen mode for every adapter, or even the various sound cards out there. With Windows, you have API calls where all that funblework has been taken care of for you.