Page 2 of 4

Posted: Sun Mar 06, 2005 10:49 am
by blueznl
lemme rewrite my sample a little :-)

Posted: Sun Mar 06, 2005 10:53 am
by Psychophanta
Rescator wrote:So.. if I where to make say a 2D game (which I plan to do).And I wanted the game main loop to do 100 or even 1000 loops per second, for calculating and moving things etc etc.
Lets call this "async rendering".
No problem if the PC has enough speed for that and, if you want smooth movements then obey what you said: "but only redraw the graphics as often as the user has his refresh to".
Rescator wrote:How would one do that?

I.e. The user has say 85Hz set.
But you want you game to do 100 loops per second.

How the heck does one do that?
Using triple buffer, or even quad buffer screen. See viewtopic.php?t=8037
Rescator wrote:Is the first post in this thread still suitable for that or?
If you are referring to "Flipbuffers():Delay(n)" that's not an ideal solution for safe smooth movements on screen. The solution i see for async rendering is more than 2 screen buffers. And, for attempt to optimize CPU time while ensure smooth movs, use a tip like i wrote yesterday on viewtopic.php?t=12825

Rescator wrote:Remember 1000ms/60Hz would be 16,666666666666666666666666666667
That is like 16 and a half "loops" possible per second.
Noo!, be careful, are you saying a main program loop per ms? Think better!
Rescator wrote:I want the user to be able to set whatever refresh rate he/she want.
(i.e a refresh rate independent game)
because to me the refresh rate is just how often a game/program
should update the display (if neccessary, i.e stuff has changed visualy)
I want to have that part completly seperate from the game speed.

The first post in this thread is darn interesting in that respect tough.

The reason I wish to fully seperate the display rate and game speed
is because, if a computer is darn slow, no matter what refresh rate is used
the graphics will still "lag". And if the computer is superfast,
well, the graphics will definetly not "lag".

So, how do I go about making a game engine timers/loop that
occurs 100 times per second.
and the graphics are updated only if there is actual changes.
Minimum data update rate would be 100th of a second,
maximum would be, well possibly unknown in case the player don't move at all if you know what I mean.
The refresh rate in my eyes is a graphics engine thing only,
and I'd love to be able to separate all graphics timing from game timing.
Well, keep in mind all those requirements at the same time can be hard to develop. Because you want:
- Save CPU time as much as possible: not waste CPU time polling and no refresh screen content if there are no graphics' changes. WOW!
- 2 independent processes: one to rendering and one to update screen AT THE RIGHT TIME to get smooth movs.

My answer:
1. Use triple screen buffer.
2. Use 2 threads:
-one for rendering, in which you ALWAYS will have a free time (timeleft to Sleep) per screen frame if the PC is enough fast (else the game will run slow, which means not smooth).
-the other one to update the screen info at the right time using a 1000/ScreenRefresh timer.

NOTE: I say: "ALWAYS will have a free time (timeleft to Sleep) per screen frame" because a game is interactive program (just like when 2 people married; no one knows what will happen with interactive stuff), and this means that nobody can know what to render for the next frame. But if it was a video clip player (in general, a not interactive program), things could be made in other way, because next frames can be known much before to be displayed.

Posted: Mon Mar 07, 2005 11:58 am
by nco2k
@Psychophanta
Threads and DirectX ?? ouch...

c ya,
nco2k

Posted: Mon Mar 07, 2005 1:50 pm
by Psychophanta
nco2k wrote:Threads and DirectX ?? ouch...
Yeah! Take a look at Daniel Vik (BlueMSX emulator author) explanation at viewtopic.php?t=12747 (Posted: Mon Oct 11, 2004 7:51 pm)

Posted: Fri Apr 08, 2005 10:38 pm
by blueznl
i'm a bit puzzled... can someone explain me why the following code returns fluctuating numbers for the high performance counter?

Code: Select all

InitSprite()

x_peek64_multiplier = Pow(2,32)

Procedure.f x_peek64(m.l)
  ProcedureReturn PeekL(m+4)*x_peek64_multiplier+PeekL(m)
EndProcedure

w = 200
h = 200
window_nr = 1
window_h = OpenWindow(window_nr,0,0,w,h,#PB_Window_ScreenCentered|#PB_Window_SystemMenu,"test")
OpenWindowedScreen(window_h,0,0,w,h,0,0,0)
;
sixtyfour.LARGE_INTEGER
QueryPerformanceFrequency_(@sixtyfour)                         
hpc_freq.f = x_peek64(@sixtyfour)
;
Repeat
  event = WindowEvent()
  ;
  FlipBuffers(1)
  QueryPerformanceCounter_(@sixtyfour)
  t1.f = x_peek64(@sixtyfour)
  ;
  FlipBuffers(1)
  QueryPerformanceCounter_(@sixtyfour)
  t2.f = x_peek64(@sixtyfour)
  ;
  dt.f = t2-t1
  Debug dt
  Delay(200)
Until event = 513 Or event = #PB_Event_CloseWindow

Posted: Fri Apr 15, 2005 8:13 am
by Psychophanta
blueznl wrote:i'm a bit puzzled... can someone explain me why the following code returns fluctuating numbers for the high performance counter?
Hi, blueznl. As a note: Since there are no 64bit floats, there is nonsense to do this to return a 32bit float:

Code: Select all

ProcedureReturn PeekL(m+4)*x_peek64_multiplier+PeekL(m)
And the correct answer to your question could be that the time used in the loop stuff is not always the same. I think because interrupts service routines (ISR), multitasking (running services), and other threads running.

Posted: Fri Apr 15, 2005 8:19 pm
by blueznl
about the float: dunno, and i'm not sure i'd entirely agree with you :-) the difference could be extreme between low and high speed machines, thus (until a 64 bit var is available) the only thing i can do is use either floats (even though they're 32 bits) or use something else (which i don't want to :-))

yeah, it could be background stuff, then again that should not matter, as i'm tying into the vsync, am i not?

Posted: Tue May 10, 2005 3:39 pm
by Psychophanta
Above, i suggested about the benefits of using more than one screen buffer and more than one execution thread, but take a look at this reply from the Atari ST Steem emulator (best ST emulator now, of course) coders. I think it is very interesting, because they use only one screen frame and only one thread :shock: and they get the best results i have tested in comparison with other emulators.
That means that the tricks to perform a good and smooth movements and a good CPU resource saving are quite extensive:

My question:
You wrote:
"Steem's highly optimised scanline drawing routines are written in assembler ..."
And in fact, after comparing your optimization about this stuff, with other emulators; BLUEMSX, MAME, ZSNES, and some others; yours wins.

As you know, in the WAITforVsync functionnality of DirectX, the CPU time is 100% and the program
locks on this command before flipping screen buffers, so, i have noticed
that STEEM (and others like ZSNES, etc.) don't take (eat) the 100% CPU
time when waiting before to swap screen buffers when displaying MSX screen
data.

My questions are these, but an unique answer would be for all:
- How many screen buffers do you use in Steem?
- How do you get that optimization of the CPU time while get a perfect syncronized smooth graphics? Using timer/s? Using 2 or more threads?
- Do you think it should be possible to do it still better than STEEM already do?

I know you have the best answer for this.


Best regards.

Albert
Their reply:
Hi Albert,

> You wrote:
> "Steem's highly optimised scanline drawing routines are written
> in assembler ..."
> And in fact, after comparing your optimization about this
> stuff, with other emulators; BLUEMSX, MAME, ZSNES, and some
> others; yours wins.

Really, that's good news! :D It is nice to know something in Steem is faster
than the other emulators, most of the time it is slower.

> As you know, in the WAITforVsync functionnality of DirectX, the
> CPU time is 100% and the program
> locks on this command before flipping screen buffers, so, i
> have noticed
> that STEEM (and others like ZSNES, etc.) don't take (eat) the
> 100% CPU
> time when waiting before to swap screen buffers when displaying
> MSX screen
> data.
>
> My questions are these, but an unique answer would be for all:
> - How many screen buffers do you use in Steem?

Only one actually, that is one reason why we can't implement many effects
easily, we don't really want to fiddle with it too much in case we slow it
down or mess things up.

> - How do you get that optimization of the CPU time while get a
> perfect syncronized smooth graphics? Using timer/s? Using 2 or
> more threads?

No actually we do it an incredibly dumb way! Steem only uses one thread, it
runs one frame of CPU drawing the screen to the buffer as it goes. Then if
vsync is off it just blits it to the primary display (or screen flips)
straight away, but I expect you are more interested in vsync on. Next it
checks for a few Windows messages - this is where it gets quite different
with vsync on. Normally after getting messages it would sleep in a low CPU
load way until it is time for the next frame to be drawn. But when vsync is
on it sleeps to about 75% of the way through the frame and then waits for a
vsync. As you said WAITforVsync is rubbish, not only for CPU load but also
on my system it missed some vsyncs. Instead Steem uses
GetVerticalBlankStatus to see if a vsync is in progress - in which case it
blits/flips immediately. If there is no vsync Steem uses GetScanLine to find
out when the gun is near the bottom of the screen and when it is assumes it
is a vsync. It's a bit of a hack and you can see this on some ST screens
(the frame wobbles right at the bottom) but it is a small price to pay for a
working vsync.

I don't know how this ends up taking up less CPU time than other drawing
systems, it is probably down to the short amount of time Steem has to spend
on its CPU emulation (only being 8Mhz). That would give it more time to be
in a sleep state before the blit. But also the drawing routines must help,
Ant wrote 4 different versions to see which would be fastest. There are
about 20 different routines, one for every combination of ST resolution and
PC bit depth, plus a few more for double width and/or double height lines.
We basically sacrificed flexibility for speed, which is good most of the
time but annoying when people request special effects.

> - Do you think it should be possible to do it still better than
> STEEM already do?

Oh yes I'm sure, two threads would probably be a good idea, we just couldn't
be bothered with the thread safety issues. Some more options regarding how
the video memory is used would be good for some PCs, but the current display
options are quite obscure, it would just add to confusion.

I hope this has helped a bit, although I'm not sure Steem's system would
work very well in other emulators.

Regards,
Russell Hayward

Posted: Tue May 24, 2005 4:52 pm
by blueznl
i'm a total looser when it comes to com or interfaces, but the solution above is quite interesting... they are looking at vblank information and using that, could anyone help me translating that into regular (non-oo :-)) code?

the high resolution timer approach gives me rather unpredictable results on some machines, i'd need 64 bit ints to make sure it returns proper results, i'm afraid...

Posted: Mon Jun 06, 2005 8:37 pm
by blueznl
still working on this subject, and came up with this:

Code: Select all

Procedure.s x_peekbin(addr.l,Length.l,mode.l)
  Protected n.l, s.s
  ;
  If mode = 0
    ;
    ; big endian or byte sequence
    ;
    For n = 0 To Length-1
      s = s+RSet(Bin(PeekB(addr+n) & $FF),8,"0")
    Next n
    ProcedureReturn s
    ;
  ElseIf mode = 1
    ;
    ; little endian
    ;
    For n = 0 To Length-1
      s = RSet(Bin(PeekB(addr+n) & $FF),8,"0")+s
    Next n
    ProcedureReturn s
    ;
  Else
    ;
    ; special mode for debugging purposes, little endian with separation spaces, THIS USE MAY CHANGE
    ;
    For n = 0 To Length-1
      s = RSet(Bin(PeekB(addr+n) & $FF),8,"0")+s
      If n < Length -1
        s = " "+s
      EndIf        
    Next n
    ProcedureReturn s
    ;
  EndIf
  ;
EndProcedure


z = 10000
Dim a.s(z)
;
sixtyfour.LARGE_INTEGER
QueryPerformanceFrequency_(@sixtyfour)                         

For n = 0 To z-1
  QueryPerformanceCounter_(@sixtyfour)
  a(n) = x_peekbin(@sixtyfour,8,2)
Next n

For n = 0 To z-1
  Debug a(n)
Next n
as pure doesn't support 64 bit ints yet (shame shame) i now have to cook up something else here... keep you posted...

Posted: Tue Jun 07, 2005 5:50 am
by Hatonastick
Yeah I'm still trying to work this one out myself. I thought I had an answer until I realised I was over simplifying the problem. :roll: Vastly over simplifying. Mind you I haven't written a game in years - since the days with computers that ran at the same speed so you didn't have to worry about timing. I don't think GameMaker counts. :) This problem needs to be cracked I think for all PB users who want to write games. I know I definitely need a solution instead of the hacks and band-aids I've been using. If I can come up with anything I'll definitely let you know.

Posted: Tue Jun 07, 2005 5:43 pm
by blueznl
the way to do this (i may need some assembly help here people) is reading that part of the high res timer that is useful, and ignore all bits below and above

1. we should be able to figure out how many bits we can drop from the frequency

2. then read it into a 64 bit block (8 bytes)

3. then shift it as many times left as need be

4. then do a peekl and we have a good high resolution counter in 32 bits that is good enough for game timing

could any of you machine code experts tell me how to shift a block of 8 bytes a bit to the right (or left)?

Posted: Tue Jun 07, 2005 6:50 pm
by Pupil
blueznl wrote:could any of you machine code experts tell me how to shift a block of 8 bytes a bit to the right (or left)?
Ok

Code: Select all

Structure Quad
  lo.l
  hi.l
EndStructure

Procedure ShiftQuadRight(*source.Quad, *dest.Quad, shift.l)
  !MOV ebp, [esp];*source
  !MOV eax, [ebp]
  !MOV ebx, [ebp+4]
  !MOV ecx, [esp+8];shift
  !MOV ebp, [esp+4];*dest
  !SHRD eax, ebx, cl
  !SHR ebx, cl
  !MOV [ebp], eax
  !MOV [ebp+4], ebx
EndProcedure

a.Quad\lo = %10101010101010101010101010101010
a\hi =      %11001100110011001100110011001111

Debug "%"+RSet(Bin(a\hi), 32, "0")+" "+RSet(Bin(a\lo), 32, "0")

ShiftQuadRight(@a, @b.Quad, 8)

Debug "%"+RSet(Bin(b\hi), 32, "0")+" "+RSet(Bin(b\lo), 32, "0")

Posted: Tue Jun 07, 2005 7:31 pm
by El_Choni
MMX version :)

Code: Select all

Procedure MMXShiftQuadRight(*source.Quad, *dest.Quad, shift.l)
  !MOVD mm1, [esp+8]
  !MOV eax, [esp]
  !MOVQ mm0, [eax]
  !PSRLQ mm0, mm1
  !MOV eax, [esp+4]
  !MOVQ [eax], mm0
  !EMMS ; clear floating point tag word
EndProcedure

Posted: Tue Jun 07, 2005 11:18 pm
by blueznl
hey el choni, pupil, thx :-) euhm... it can operate on the same 64 bit block, ie. there are no different source and destinations...

would that be faster?

:P