computer speed independent programming

Share your advanced PureBasic knowledge/code with the community.
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

lemme rewrite my sample a little :-)
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

Rescator wrote:So.. if I where to make say a 2D game (which I plan to do).And I wanted the game main loop to do 100 or even 1000 loops per second, for calculating and moving things etc etc.
Lets call this "async rendering".
No problem if the PC has enough speed for that and, if you want smooth movements then obey what you said: "but only redraw the graphics as often as the user has his refresh to".
Rescator wrote:How would one do that?

I.e. The user has say 85Hz set.
But you want you game to do 100 loops per second.

How the heck does one do that?
Using triple buffer, or even quad buffer screen. See viewtopic.php?t=8037
Rescator wrote:Is the first post in this thread still suitable for that or?
If you are referring to "Flipbuffers():Delay(n)" that's not an ideal solution for safe smooth movements on screen. The solution i see for async rendering is more than 2 screen buffers. And, for attempt to optimize CPU time while ensure smooth movs, use a tip like i wrote yesterday on viewtopic.php?t=12825

Rescator wrote:Remember 1000ms/60Hz would be 16,666666666666666666666666666667
That is like 16 and a half "loops" possible per second.
Noo!, be careful, are you saying a main program loop per ms? Think better!
Rescator wrote:I want the user to be able to set whatever refresh rate he/she want.
(i.e a refresh rate independent game)
because to me the refresh rate is just how often a game/program
should update the display (if neccessary, i.e stuff has changed visualy)
I want to have that part completly seperate from the game speed.

The first post in this thread is darn interesting in that respect tough.

The reason I wish to fully seperate the display rate and game speed
is because, if a computer is darn slow, no matter what refresh rate is used
the graphics will still "lag". And if the computer is superfast,
well, the graphics will definetly not "lag".

So, how do I go about making a game engine timers/loop that
occurs 100 times per second.
and the graphics are updated only if there is actual changes.
Minimum data update rate would be 100th of a second,
maximum would be, well possibly unknown in case the player don't move at all if you know what I mean.
The refresh rate in my eyes is a graphics engine thing only,
and I'd love to be able to separate all graphics timing from game timing.
Well, keep in mind all those requirements at the same time can be hard to develop. Because you want:
- Save CPU time as much as possible: not waste CPU time polling and no refresh screen content if there are no graphics' changes. WOW!
- 2 independent processes: one to rendering and one to update screen AT THE RIGHT TIME to get smooth movs.

My answer:
1. Use triple screen buffer.
2. Use 2 threads:
-one for rendering, in which you ALWAYS will have a free time (timeleft to Sleep) per screen frame if the PC is enough fast (else the game will run slow, which means not smooth).
-the other one to update the screen info at the right time using a 1000/ScreenRefresh timer.

NOTE: I say: "ALWAYS will have a free time (timeleft to Sleep) per screen frame" because a game is interactive program (just like when 2 people married; no one knows what will happen with interactive stuff), and this means that nobody can know what to render for the next frame. But if it was a video clip player (in general, a not interactive program), things could be made in other way, because next frames can be known much before to be displayed.
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Post by nco2k »

@Psychophanta
Threads and DirectX ?? ouch...

c ya,
nco2k
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

nco2k wrote:Threads and DirectX ?? ouch...
Yeah! Take a look at Daniel Vik (BlueMSX emulator author) explanation at viewtopic.php?t=12747 (Posted: Mon Oct 11, 2004 7:51 pm)
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

i'm a bit puzzled... can someone explain me why the following code returns fluctuating numbers for the high performance counter?

Code: Select all

InitSprite()

x_peek64_multiplier = Pow(2,32)

Procedure.f x_peek64(m.l)
  ProcedureReturn PeekL(m+4)*x_peek64_multiplier+PeekL(m)
EndProcedure

w = 200
h = 200
window_nr = 1
window_h = OpenWindow(window_nr,0,0,w,h,#PB_Window_ScreenCentered|#PB_Window_SystemMenu,"test")
OpenWindowedScreen(window_h,0,0,w,h,0,0,0)
;
sixtyfour.LARGE_INTEGER
QueryPerformanceFrequency_(@sixtyfour)                         
hpc_freq.f = x_peek64(@sixtyfour)
;
Repeat
  event = WindowEvent()
  ;
  FlipBuffers(1)
  QueryPerformanceCounter_(@sixtyfour)
  t1.f = x_peek64(@sixtyfour)
  ;
  FlipBuffers(1)
  QueryPerformanceCounter_(@sixtyfour)
  t2.f = x_peek64(@sixtyfour)
  ;
  dt.f = t2-t1
  Debug dt
  Delay(200)
Until event = 513 Or event = #PB_Event_CloseWindow
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

blueznl wrote:i'm a bit puzzled... can someone explain me why the following code returns fluctuating numbers for the high performance counter?
Hi, blueznl. As a note: Since there are no 64bit floats, there is nonsense to do this to return a 32bit float:

Code: Select all

ProcedureReturn PeekL(m+4)*x_peek64_multiplier+PeekL(m)
And the correct answer to your question could be that the time used in the loop stuff is not always the same. I think because interrupts service routines (ISR), multitasking (running services), and other threads running.
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

about the float: dunno, and i'm not sure i'd entirely agree with you :-) the difference could be extreme between low and high speed machines, thus (until a 64 bit var is available) the only thing i can do is use either floats (even though they're 32 bits) or use something else (which i don't want to :-))

yeah, it could be background stuff, then again that should not matter, as i'm tying into the vsync, am i not?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

Above, i suggested about the benefits of using more than one screen buffer and more than one execution thread, but take a look at this reply from the Atari ST Steem emulator (best ST emulator now, of course) coders. I think it is very interesting, because they use only one screen frame and only one thread :shock: and they get the best results i have tested in comparison with other emulators.
That means that the tricks to perform a good and smooth movements and a good CPU resource saving are quite extensive:

My question:
You wrote:
"Steem's highly optimised scanline drawing routines are written in assembler ..."
And in fact, after comparing your optimization about this stuff, with other emulators; BLUEMSX, MAME, ZSNES, and some others; yours wins.

As you know, in the WAITforVsync functionnality of DirectX, the CPU time is 100% and the program
locks on this command before flipping screen buffers, so, i have noticed
that STEEM (and others like ZSNES, etc.) don't take (eat) the 100% CPU
time when waiting before to swap screen buffers when displaying MSX screen
data.

My questions are these, but an unique answer would be for all:
- How many screen buffers do you use in Steem?
- How do you get that optimization of the CPU time while get a perfect syncronized smooth graphics? Using timer/s? Using 2 or more threads?
- Do you think it should be possible to do it still better than STEEM already do?

I know you have the best answer for this.


Best regards.

Albert
Their reply:
Hi Albert,

> You wrote:
> "Steem's highly optimised scanline drawing routines are written
> in assembler ..."
> And in fact, after comparing your optimization about this
> stuff, with other emulators; BLUEMSX, MAME, ZSNES, and some
> others; yours wins.

Really, that's good news! :D It is nice to know something in Steem is faster
than the other emulators, most of the time it is slower.

> As you know, in the WAITforVsync functionnality of DirectX, the
> CPU time is 100% and the program
> locks on this command before flipping screen buffers, so, i
> have noticed
> that STEEM (and others like ZSNES, etc.) don't take (eat) the
> 100% CPU
> time when waiting before to swap screen buffers when displaying
> MSX screen
> data.
>
> My questions are these, but an unique answer would be for all:
> - How many screen buffers do you use in Steem?

Only one actually, that is one reason why we can't implement many effects
easily, we don't really want to fiddle with it too much in case we slow it
down or mess things up.

> - How do you get that optimization of the CPU time while get a
> perfect syncronized smooth graphics? Using timer/s? Using 2 or
> more threads?

No actually we do it an incredibly dumb way! Steem only uses one thread, it
runs one frame of CPU drawing the screen to the buffer as it goes. Then if
vsync is off it just blits it to the primary display (or screen flips)
straight away, but I expect you are more interested in vsync on. Next it
checks for a few Windows messages - this is where it gets quite different
with vsync on. Normally after getting messages it would sleep in a low CPU
load way until it is time for the next frame to be drawn. But when vsync is
on it sleeps to about 75% of the way through the frame and then waits for a
vsync. As you said WAITforVsync is rubbish, not only for CPU load but also
on my system it missed some vsyncs. Instead Steem uses
GetVerticalBlankStatus to see if a vsync is in progress - in which case it
blits/flips immediately. If there is no vsync Steem uses GetScanLine to find
out when the gun is near the bottom of the screen and when it is assumes it
is a vsync. It's a bit of a hack and you can see this on some ST screens
(the frame wobbles right at the bottom) but it is a small price to pay for a
working vsync.

I don't know how this ends up taking up less CPU time than other drawing
systems, it is probably down to the short amount of time Steem has to spend
on its CPU emulation (only being 8Mhz). That would give it more time to be
in a sleep state before the blit. But also the drawing routines must help,
Ant wrote 4 different versions to see which would be fastest. There are
about 20 different routines, one for every combination of ST resolution and
PC bit depth, plus a few more for double width and/or double height lines.
We basically sacrificed flexibility for speed, which is good most of the
time but annoying when people request special effects.

> - Do you think it should be possible to do it still better than
> STEEM already do?

Oh yes I'm sure, two threads would probably be a good idea, we just couldn't
be bothered with the thread safety issues. Some more options regarding how
the video memory is used would be good for some PCs, but the current display
options are quite obscure, it would just add to confusion.

I hope this has helped a bit, although I'm not sure Steem's system would
work very well in other emulators.

Regards,
Russell Hayward
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

i'm a total looser when it comes to com or interfaces, but the solution above is quite interesting... they are looking at vblank information and using that, could anyone help me translating that into regular (non-oo :-)) code?

the high resolution timer approach gives me rather unpredictable results on some machines, i'd need 64 bit ints to make sure it returns proper results, i'm afraid...
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

still working on this subject, and came up with this:

Code: Select all

Procedure.s x_peekbin(addr.l,Length.l,mode.l)
  Protected n.l, s.s
  ;
  If mode = 0
    ;
    ; big endian or byte sequence
    ;
    For n = 0 To Length-1
      s = s+RSet(Bin(PeekB(addr+n) & $FF),8,"0")
    Next n
    ProcedureReturn s
    ;
  ElseIf mode = 1
    ;
    ; little endian
    ;
    For n = 0 To Length-1
      s = RSet(Bin(PeekB(addr+n) & $FF),8,"0")+s
    Next n
    ProcedureReturn s
    ;
  Else
    ;
    ; special mode for debugging purposes, little endian with separation spaces, THIS USE MAY CHANGE
    ;
    For n = 0 To Length-1
      s = RSet(Bin(PeekB(addr+n) & $FF),8,"0")+s
      If n < Length -1
        s = " "+s
      EndIf        
    Next n
    ProcedureReturn s
    ;
  EndIf
  ;
EndProcedure


z = 10000
Dim a.s(z)
;
sixtyfour.LARGE_INTEGER
QueryPerformanceFrequency_(@sixtyfour)                         

For n = 0 To z-1
  QueryPerformanceCounter_(@sixtyfour)
  a(n) = x_peekbin(@sixtyfour,8,2)
Next n

For n = 0 To z-1
  Debug a(n)
Next n
as pure doesn't support 64 bit ints yet (shame shame) i now have to cook up something else here... keep you posted...
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
Hatonastick
Enthusiast
Enthusiast
Posts: 149
Joined: Wed Apr 27, 2005 11:50 am
Location: Adelaide, Australia
Contact:

Post by Hatonastick »

Yeah I'm still trying to work this one out myself. I thought I had an answer until I realised I was over simplifying the problem. :roll: Vastly over simplifying. Mind you I haven't written a game in years - since the days with computers that ran at the same speed so you didn't have to worry about timing. I don't think GameMaker counts. :) This problem needs to be cracked I think for all PB users who want to write games. I know I definitely need a solution instead of the hacks and band-aids I've been using. If I can come up with anything I'll definitely let you know.
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

the way to do this (i may need some assembly help here people) is reading that part of the high res timer that is useful, and ignore all bits below and above

1. we should be able to figure out how many bits we can drop from the frequency

2. then read it into a 64 bit block (8 bytes)

3. then shift it as many times left as need be

4. then do a peekl and we have a good high resolution counter in 32 bits that is good enough for game timing

could any of you machine code experts tell me how to shift a block of 8 bytes a bit to the right (or left)?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
Pupil
Enthusiast
Enthusiast
Posts: 715
Joined: Fri Apr 25, 2003 3:56 pm

Post by Pupil »

blueznl wrote:could any of you machine code experts tell me how to shift a block of 8 bytes a bit to the right (or left)?
Ok

Code: Select all

Structure Quad
  lo.l
  hi.l
EndStructure

Procedure ShiftQuadRight(*source.Quad, *dest.Quad, shift.l)
  !MOV ebp, [esp];*source
  !MOV eax, [ebp]
  !MOV ebx, [ebp+4]
  !MOV ecx, [esp+8];shift
  !MOV ebp, [esp+4];*dest
  !SHRD eax, ebx, cl
  !SHR ebx, cl
  !MOV [ebp], eax
  !MOV [ebp+4], ebx
EndProcedure

a.Quad\lo = %10101010101010101010101010101010
a\hi =      %11001100110011001100110011001111

Debug "%"+RSet(Bin(a\hi), 32, "0")+" "+RSet(Bin(a\lo), 32, "0")

ShiftQuadRight(@a, @b.Quad, 8)

Debug "%"+RSet(Bin(b\hi), 32, "0")+" "+RSet(Bin(b\lo), 32, "0")
El_Choni
TailBite Expert
TailBite Expert
Posts: 1007
Joined: Fri Apr 25, 2003 6:09 pm
Location: Spain

Post by El_Choni »

MMX version :)

Code: Select all

Procedure MMXShiftQuadRight(*source.Quad, *dest.Quad, shift.l)
  !MOVD mm1, [esp+8]
  !MOV eax, [esp]
  !MOVQ mm0, [eax]
  !PSRLQ mm0, mm1
  !MOV eax, [esp+4]
  !MOVQ [eax], mm0
  !EMMS ; clear floating point tag word
EndProcedure
Last edited by El_Choni on Wed Jun 08, 2005 1:20 am, edited 1 time in total.
El_Choni
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

hey el choni, pupil, thx :-) euhm... it can operate on the same 64 bit block, ie. there are no different source and destinations...

would that be faster?

:P
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
Post Reply