help with asm

Just starting out? Need help? Post your questions and find answers here.
coma
Enthusiast
Enthusiast
Posts: 164
Joined: Fri Aug 15, 2003 3:46 am
Location: Canada

help with asm

Post by coma »

I have a simple program :

Code: Select all

InitSprite() : InitKeyboard()
OpenScreen(800,600, 32,"test")
CreateSprite(0,800,600,0)

;Draw some stuff on sprite(0)
StartDrawing(SpriteOutput(0))
  For i=0 To 50
    Box (Random(800),Random(600),Random(300),Random(300),Random($ffffff))
  Next
StopDrawing ()

Repeat
  ExamineKeyboard ()
  DisplaySprite (0,-x,0)
  x+1
  FlipBuffers() 
Until KeyboardPushed(#PB_Key_Escape)
End

ok, good news, this is perfectly smooth on my pc (not a powerfull pc) :D



Now, If I use CreateSprite(0,800,600,#PB_Sprite_Memory) instead of CreateSprite(0,800,600,0) it's slower, but it's normal, because copy from vram to vram is faster than from ram to vram.

But when I try tu use asm to copy the sprite from vram to the screen, it's VERY slooooooooooooow :

Code: Select all

InitSprite() : InitKeyboard()
OpenScreen(800,600, 32,"test")
CreateSprite(0,800,600,0)

StartDrawing(SpriteOutput(0))
  For i=0 To 50
    Box (Random(800),Random(600),Random(300),Random(300),Random($ffffff))
  Next
StopDrawing ()

Repeat
  ExamineKeyboard ()

  StartDrawing(SpriteOutput(0))
  StopDrawing ()
  *source.l = DrawingBuffer()+4*x
  StartDrawing(ScreenOutput())
  StopDrawing ()
  *dest.l= DrawingBuffer()

  MOV esi,*source
  MOV edi,*dest
  MOV ecx,800*600
  REP MOVSD

  x+1
  FlipBuffers() 
Until KeyboardPushed(#PB_Key_Escape)
End

Why this mem copy using asm is so slow ("relatively" slow) ?
"Did I do something wrong" ?
have I to "unlock" the screen, to accelerate the copy ?
(similar to the UnlockBuffer command on Blitz, for high speed pixel operations)
Motu23
New User
New User
Posts: 9
Joined: Mon Aug 04, 2003 9:48 pm
Location: Germany
Contact:

Post by Motu23 »

The second code works for you ? I just have a lot of stupid lines on the screen...

anyway you use two Startdrawing Stopdrawing Operations in the second example, this is a Timeeater by itself.

But i guess the more importent fakt ist that on the one Hand 2DDrawing Lib is allways not the fastes
And on thes other Hand ESI and EDI are processor registers. So you move every pixel from the videomemory to the CPU and from the CPU back to the videomemory. Thats not a good one...
It's getting a lot faster if you put the sprite in the RAM insteed of the videomemory (CreateSprite(0,800,600,#PB_Sprite_Memory)).
May be this helps you
coma
Enthusiast
Enthusiast
Posts: 164
Joined: Fri Aug 15, 2003 3:46 am
Location: Canada

Post by coma »

thanks for your answer, motu23.

But when I copy from ram to vram, it's not smooth (and it's worse when I copy from vram to vram).

So, the only solution I have to make smooth scrolling is to use the
DisplaySprite () command (with the sprite in video ram).

The problem is that sprite command have some limitations (I can't copy the sprite only on a half part of the screen for exemple (a clip), that I need to program a "split screen mode").
It's not possible to make smooth scrollings using mem copy ? (I have a celeron 566 + rivaTNT)
Motu23
New User
New User
Posts: 9
Joined: Mon Aug 04, 2003 9:48 pm
Location: Germany
Contact:

Post by Motu23 »

try this piece of code:

InitSprite()
InitKeyboard()
OpenScreen(800,600,16,"Test Clip")

CreateSprite(0,64,64,0)
StartDrawing(SpriteOutput(0))
Box(0,0,64,64,255+255*256)
Line(0,0,64,64,255)
StopDrawing()

Dim Sprite_Width.l(100) ; Spritewidht will return cliped lenght !
Dim Sprite_Height.l(100) ; So I use an array instead of the function.

Sprite_Width.l(0) = SpriteWidth(0)
Sprite_Height.l(0) = SpriteHeight(0)

; BTW: This way is 4 times faster than calling Spritewidht(Nr) when you
; need It

Procedure Display_TransparentSprite(Sprite_Nr.l,PosX.l,PosY.l, MinX.l,MinY.l,MaxX.l,MaxY.l)
LongX.l = Sprite_Width(Sprite_Nr.l)
LongY.l = Sprite_Height(Sprite_Nr.l)
Clip_X1 = 0
Clip_Y1 = 0
Clip_X2 = LongX.l
Clip_Y2 = LongY.l
If PosX.l < MinX.l
Clip_X1.l = MinX.l - PosX.l
PosX.l + Clip_X1.l
ElseIf PosX.l + LongX.l > MaxX
Clip_X2.l = LongX.l - ((PosX.l + LongX.l)-MaxX)
EndIf

If PosY.l < MinY.l
Clip_Y1.l = MinY.l - PosY.l
PosY.l + Clip_Y1.l
ElseIf PosY.l + LongY.l > MaxY
Clip_Y2.l = LongY.l - ((PosY.l + LongY.l)-MaxY)
EndIf
ClipSprite(Sprite_Nr.l,Clip_X1.l,Clip_Y1.l,Clip_X2.l,Clip_Y2.l)
DisplayTransparentSprite(Sprite_Nr.l,PosX.l,PosY.l)
EndProcedure


HeroX = 200
HeroY = 200

Repeat

ExamineKeyboard()
If KeyboardPushed(200): HeroY - 1: EndIf
If KeyboardPushed(203): HeroX - 1: EndIf
If KeyboardPushed(205): HeroX + 1: EndIf
If KeyboardPushed(208): HeroY + 1: EndIf

ClearScreen(0,0,0)
StartDrawing(ScreenOutput())
Line(100,100,300,0,255)
Line(100,100,0,200,255)
Line(400,300,0,-200,255)
Line(400,300,-300,0,255)
StopDrawing()
Display_TransparentSprite(0,HeroX,HeroY,100,100,400,300)
FlipBuffers()

Until KeyboardReleased(1)
coma
Enthusiast
Enthusiast
Posts: 164
Joined: Fri Aug 15, 2003 3:46 am
Location: Canada

Post by coma »

ok motu, thanks, it's a good solution for split screen mode :)

another solution is to use little virtual buffers, and then copy them to screen.
Motu23
New User
New User
Posts: 9
Joined: Mon Aug 04, 2003 9:48 pm
Location: Germany
Contact:

Post by Motu23 »

I haven't tested this, but i guess extra buffers are a little bit slower as you have to do all drawing operation twice. They cost extra memory as well...
You must select for yourself.
Good Luck with your Splitscreen. :-)
spangly
User
User
Posts: 54
Joined: Mon Apr 28, 2003 8:26 pm
Contact:

Post by spangly »

The reason your screen looks like garbage is because on a lot of graphics cards (my GF4 included) the display memory isn't 100% linear. Each row of pixels is aligned to a certain memory boundry. At the end of each row is a section of memory that isn't displayed. Use the function DrawingBufferPitch to find the real length.

For instance, a row of 800 pixels uses 3200 bytes in 32 bit mode, but the DrawingBufferPitch() function tells me my card actually uses 3258 bytes so you have to copy 3200 bytes then skip 58 to get to the next row.
Motu23
New User
New User
Posts: 9
Joined: Mon Aug 04, 2003 9:48 pm
Location: Germany
Contact:

Post by Motu23 »

IHH.. That's ...
coma
Enthusiast
Enthusiast
Posts: 164
Joined: Fri Aug 15, 2003 3:46 am
Location: Canada

Post by coma »

but I haven't my answer concerning asm :) => why it's so slow when I try to directly access to the screen ?
Just for a clearscreen (supposed to be very fast), it's slower than when I use the box function.

here, I use the box() function to clear the screen :

Code: Select all

InitSprite() : InitKeyboard()
OpenScreen(800,600, 32,"test") 

CreateSprite(0,64,64,0) 
StartDrawing (SpriteOutput(0))
  Box (0,0,64,64,RGB(255,0,0))
StopDrawing()

Repeat 
  ExamineKeyboard () 
  
  StartDrawing (ScreenOutput())
    Box (0,0,800,600,RGB(0,0,128))  ; clearscreen
  StopDrawing()

  DisplaySprite (0,x,200)
  x+1
  FlipBuffers() 
Until KeyboardPushed(#PB_Key_Escape) 
End 
now, same program using asm to clear the screen :

Code: Select all

InitSprite() : InitKeyboard()
OpenScreen(800,600, 32,"test") 

CreateSprite(0,64,64,0) 
StartDrawing (SpriteOutput(0))
  Box (0,0,64,64,RGB(255,0,0))
StopDrawing()

Repeat 
  ExamineKeyboard () 

  StartDrawing(ScreenOutput()) 
  StopDrawing () 
  *dest.l= DrawingBuffer() 
  MOV eax,$5000 
  MOV edi,*dest 
  MOV ecx,800*600
  REP STOSD 

  DisplaySprite (0,x,200)
  x+1
  FlipBuffers() 
Until KeyboardPushed(#PB_Key_Escape) 
End 
this second program is not smooth, why ?
Motu23
New User
New User
Posts: 9
Joined: Mon Aug 04, 2003 9:48 pm
Location: Germany
Contact:

Post by Motu23 »

How about this:

"ESI and EDI are processor registers. So you move every pixel from the videomemory to the CPU and from the CPU back to the videomemory. Thats not a good one... " Motu23


...
coma
Enthusiast
Enthusiast
Posts: 164
Joined: Fri Aug 15, 2003 3:46 am
Location: Canada

Post by coma »

ok motu, but in my last exemple, I don't copy from a buffer to another, I just put a value (eax) on screen......
Fred
Administrator
Administrator
Posts: 18162
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

It could have severalreasons why the asm code isn't fast:

1) The video bus restriction. Writing from systemmem to videomem can be very slow, especially on old hardware

2) Instructions problems. Using REP STOSD combo seems slow to me. I would write a method which unroll the loop and use DEC/CMP/JZ which are faster instructions and you will benefit from CPU pipeline and multi scalar structure (don't forget to align the whole thing)

Good luck :)
coma
Enthusiast
Enthusiast
Posts: 164
Joined: Fri Aug 15, 2003 3:46 am
Location: Canada

Post by coma »

I have tested this :

Code: Select all

  MOV eax,$5000 
  MOV edi,*dest 
  MOV ecx,800*600
  !Looop:
    MOV [edi],eax
    ADD edi,4
    DEC ecx
  !jnz Looop    
Nothing change.

I don't understand when you say that writing to video memory can be very slow : Your clearscreen() and box() functions are much faster than my asm code. How can you do this ?
is there nothing which speed down the copy ? A sort of "screenlock protection" that I can disable ?
Last edited by coma on Tue Aug 26, 2003 7:07 pm, edited 2 times in total.
freak
PureBasic Team
PureBasic Team
Posts: 5940
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Post by freak »

Do you have the Debugger enabled?
Doesn't the debugger slow down InlineASM code very much?

Just a guess...

Timo
quidquid Latine dictum sit altum videtur
oldefoxx
Enthusiast
Enthusiast
Posts: 532
Joined: Fri Jul 25, 2003 11:24 pm

Direct Copy Instead of Using Video Card Drivers

Post by oldefoxx »

The non-linear explanation above is the correct one. As it happens, most
advanced video cards, in addition to having screen memory layed out in a nonlinear fashion, have hardware accelerators that allow faster updates by direct video RAM writes using DMA access. They do this in part by using their own internal registers, and with several pages of memory, they can rapidly switch whole screen buffer areas between front and back buffers with just a single register change. Your direct write effort is making this a CPU-intensive effort, instead of offloading the task to the Video Card. As you said, your PC is not particularly fast, and moving 800*600 doublewords. which is almost 2 Megabytes, is not going to be done in a flash. I also note that you need to move 4 bytes less than you indicate, since the move is not a direct overlay, as you do not want to draw in 4 random bytes from outside the buffer area at the tail end of your move.

But as several have noted above, they are more likely to just see horizontal tear lines because not only is the memory used in a non-linear fashion, it's order and arrangement is also impacted by the screen mode, screen size(640x480, 800x600, etc.), and color depth. In other words, this is not a method of screen updates that can be used by everyone, or even yourself if you select a different screen setup. In DOS, it was a major pain to try and figure out how to work with every screen mode for every adapter, or even the various sound cards out there. With Windows, you have API calls where all that funblework has been taken care of for you.
has-been wanna-be (You may not agree with what I say, but it will make you think).
Post Reply