
computer speed independent programming
lemme rewrite my sample a little 

( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
- Psychophanta
- Always Here
- Posts: 5153
- Joined: Wed Jun 11, 2003 9:33 pm
- Location: Anare
- Contact:
Lets call this "async rendering".Rescator wrote:So.. if I where to make say a 2D game (which I plan to do).And I wanted the game main loop to do 100 or even 1000 loops per second, for calculating and moving things etc etc.
No problem if the PC has enough speed for that and, if you want smooth movements then obey what you said: "but only redraw the graphics as often as the user has his refresh to".
Using triple buffer, or even quad buffer screen. See viewtopic.php?t=8037Rescator wrote:How would one do that?
I.e. The user has say 85Hz set.
But you want you game to do 100 loops per second.
How the heck does one do that?
If you are referring to "Flipbuffers():Delay(n)" that's not an ideal solution for safe smooth movements on screen. The solution i see for async rendering is more than 2 screen buffers. And, for attempt to optimize CPU time while ensure smooth movs, use a tip like i wrote yesterday on viewtopic.php?t=12825Rescator wrote:Is the first post in this thread still suitable for that or?
Noo!, be careful, are you saying a main program loop per ms? Think better!Rescator wrote:Remember 1000ms/60Hz would be 16,666666666666666666666666666667
That is like 16 and a half "loops" possible per second.
Well, keep in mind all those requirements at the same time can be hard to develop. Because you want:Rescator wrote:I want the user to be able to set whatever refresh rate he/she want.
(i.e a refresh rate independent game)
because to me the refresh rate is just how often a game/program
should update the display (if neccessary, i.e stuff has changed visualy)
I want to have that part completly seperate from the game speed.
The first post in this thread is darn interesting in that respect tough.
The reason I wish to fully seperate the display rate and game speed
is because, if a computer is darn slow, no matter what refresh rate is used
the graphics will still "lag". And if the computer is superfast,
well, the graphics will definetly not "lag".
So, how do I go about making a game engine timers/loop that
occurs 100 times per second.
and the graphics are updated only if there is actual changes.
Minimum data update rate would be 100th of a second,
maximum would be, well possibly unknown in case the player don't move at all if you know what I mean.
The refresh rate in my eyes is a graphics engine thing only,
and I'd love to be able to separate all graphics timing from game timing.
- Save CPU time as much as possible: not waste CPU time polling and no refresh screen content if there are no graphics' changes. WOW!
- 2 independent processes: one to rendering and one to update screen AT THE RIGHT TIME to get smooth movs.
My answer:
1. Use triple screen buffer.
2. Use 2 threads:
-one for rendering, in which you ALWAYS will have a free time (timeleft to Sleep) per screen frame if the PC is enough fast (else the game will run slow, which means not smooth).
-the other one to update the screen info at the right time using a 1000/ScreenRefresh timer.
NOTE: I say: "ALWAYS will have a free time (timeleft to Sleep) per screen frame" because a game is interactive program (just like when 2 people married; no one knows what will happen with interactive stuff), and this means that nobody can know what to render for the next frame. But if it was a video clip player (in general, a not interactive program), things could be made in other way, because next frames can be known much before to be displayed.
- Psychophanta
- Always Here
- Posts: 5153
- Joined: Wed Jun 11, 2003 9:33 pm
- Location: Anare
- Contact:
Yeah! Take a look at Daniel Vik (BlueMSX emulator author) explanation at viewtopic.php?t=12747 (Posted: Mon Oct 11, 2004 7:51 pm)nco2k wrote:Threads and DirectX ?? ouch...
i'm a bit puzzled... can someone explain me why the following code returns fluctuating numbers for the high performance counter?
Code: Select all
InitSprite()
x_peek64_multiplier = Pow(2,32)
Procedure.f x_peek64(m.l)
ProcedureReturn PeekL(m+4)*x_peek64_multiplier+PeekL(m)
EndProcedure
w = 200
h = 200
window_nr = 1
window_h = OpenWindow(window_nr,0,0,w,h,#PB_Window_ScreenCentered|#PB_Window_SystemMenu,"test")
OpenWindowedScreen(window_h,0,0,w,h,0,0,0)
;
sixtyfour.LARGE_INTEGER
QueryPerformanceFrequency_(@sixtyfour)
hpc_freq.f = x_peek64(@sixtyfour)
;
Repeat
event = WindowEvent()
;
FlipBuffers(1)
QueryPerformanceCounter_(@sixtyfour)
t1.f = x_peek64(@sixtyfour)
;
FlipBuffers(1)
QueryPerformanceCounter_(@sixtyfour)
t2.f = x_peek64(@sixtyfour)
;
dt.f = t2-t1
Debug dt
Delay(200)
Until event = 513 Or event = #PB_Event_CloseWindow
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
- Psychophanta
- Always Here
- Posts: 5153
- Joined: Wed Jun 11, 2003 9:33 pm
- Location: Anare
- Contact:
Hi, blueznl. As a note: Since there are no 64bit floats, there is nonsense to do this to return a 32bit float:blueznl wrote:i'm a bit puzzled... can someone explain me why the following code returns fluctuating numbers for the high performance counter?
Code: Select all
ProcedureReturn PeekL(m+4)*x_peek64_multiplier+PeekL(m)
about the float: dunno, and i'm not sure i'd entirely agree with you
the difference could be extreme between low and high speed machines, thus (until a 64 bit var is available) the only thing i can do is use either floats (even though they're 32 bits) or use something else (which i don't want to
)
yeah, it could be background stuff, then again that should not matter, as i'm tying into the vsync, am i not?


yeah, it could be background stuff, then again that should not matter, as i'm tying into the vsync, am i not?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
- Psychophanta
- Always Here
- Posts: 5153
- Joined: Wed Jun 11, 2003 9:33 pm
- Location: Anare
- Contact:
Above, i suggested about the benefits of using more than one screen buffer and more than one execution thread, but take a look at this reply from the Atari ST Steem emulator (best ST emulator now, of course) coders. I think it is very interesting, because they use only one screen frame and only one thread
and they get the best results i have tested in comparison with other emulators.
That means that the tricks to perform a good and smooth movements and a good CPU resource saving are quite extensive:
My question:

That means that the tricks to perform a good and smooth movements and a good CPU resource saving are quite extensive:
My question:
Their reply:You wrote:
"Steem's highly optimised scanline drawing routines are written in assembler ..."
And in fact, after comparing your optimization about this stuff, with other emulators; BLUEMSX, MAME, ZSNES, and some others; yours wins.
As you know, in the WAITforVsync functionnality of DirectX, the CPU time is 100% and the program
locks on this command before flipping screen buffers, so, i have noticed
that STEEM (and others like ZSNES, etc.) don't take (eat) the 100% CPU
time when waiting before to swap screen buffers when displaying MSX screen
data.
My questions are these, but an unique answer would be for all:
- How many screen buffers do you use in Steem?
- How do you get that optimization of the CPU time while get a perfect syncronized smooth graphics? Using timer/s? Using 2 or more threads?
- Do you think it should be possible to do it still better than STEEM already do?
I know you have the best answer for this.
Best regards.
Albert
Hi Albert,
> You wrote:
> "Steem's highly optimised scanline drawing routines are written
> in assembler ..."
> And in fact, after comparing your optimization about this
> stuff, with other emulators; BLUEMSX, MAME, ZSNES, and some
> others; yours wins.
Really, that's good news!It is nice to know something in Steem is faster
than the other emulators, most of the time it is slower.
> As you know, in the WAITforVsync functionnality of DirectX, the
> CPU time is 100% and the program
> locks on this command before flipping screen buffers, so, i
> have noticed
> that STEEM (and others like ZSNES, etc.) don't take (eat) the
> 100% CPU
> time when waiting before to swap screen buffers when displaying
> MSX screen
> data.
>
> My questions are these, but an unique answer would be for all:
> - How many screen buffers do you use in Steem?
Only one actually, that is one reason why we can't implement many effects
easily, we don't really want to fiddle with it too much in case we slow it
down or mess things up.
> - How do you get that optimization of the CPU time while get a
> perfect syncronized smooth graphics? Using timer/s? Using 2 or
> more threads?
No actually we do it an incredibly dumb way! Steem only uses one thread, it
runs one frame of CPU drawing the screen to the buffer as it goes. Then if
vsync is off it just blits it to the primary display (or screen flips)
straight away, but I expect you are more interested in vsync on. Next it
checks for a few Windows messages - this is where it gets quite different
with vsync on. Normally after getting messages it would sleep in a low CPU
load way until it is time for the next frame to be drawn. But when vsync is
on it sleeps to about 75% of the way through the frame and then waits for a
vsync. As you said WAITforVsync is rubbish, not only for CPU load but also
on my system it missed some vsyncs. Instead Steem uses
GetVerticalBlankStatus to see if a vsync is in progress - in which case it
blits/flips immediately. If there is no vsync Steem uses GetScanLine to find
out when the gun is near the bottom of the screen and when it is assumes it
is a vsync. It's a bit of a hack and you can see this on some ST screens
(the frame wobbles right at the bottom) but it is a small price to pay for a
working vsync.
I don't know how this ends up taking up less CPU time than other drawing
systems, it is probably down to the short amount of time Steem has to spend
on its CPU emulation (only being 8Mhz). That would give it more time to be
in a sleep state before the blit. But also the drawing routines must help,
Ant wrote 4 different versions to see which would be fastest. There are
about 20 different routines, one for every combination of ST resolution and
PC bit depth, plus a few more for double width and/or double height lines.
We basically sacrificed flexibility for speed, which is good most of the
time but annoying when people request special effects.
> - Do you think it should be possible to do it still better than
> STEEM already do?
Oh yes I'm sure, two threads would probably be a good idea, we just couldn't
be bothered with the thread safety issues. Some more options regarding how
the video memory is used would be good for some PCs, but the current display
options are quite obscure, it would just add to confusion.
I hope this has helped a bit, although I'm not sure Steem's system would
work very well in other emulators.
Regards,
Russell Hayward
i'm a total looser when it comes to com or interfaces, but the solution above is quite interesting... they are looking at vblank information and using that, could anyone help me translating that into regular (non-oo
) code?
the high resolution timer approach gives me rather unpredictable results on some machines, i'd need 64 bit ints to make sure it returns proper results, i'm afraid...

the high resolution timer approach gives me rather unpredictable results on some machines, i'd need 64 bit ints to make sure it returns proper results, i'm afraid...
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
still working on this subject, and came up with this:
as pure doesn't support 64 bit ints yet (shame shame) i now have to cook up something else here... keep you posted...
Code: Select all
Procedure.s x_peekbin(addr.l,Length.l,mode.l)
Protected n.l, s.s
;
If mode = 0
;
; big endian or byte sequence
;
For n = 0 To Length-1
s = s+RSet(Bin(PeekB(addr+n) & $FF),8,"0")
Next n
ProcedureReturn s
;
ElseIf mode = 1
;
; little endian
;
For n = 0 To Length-1
s = RSet(Bin(PeekB(addr+n) & $FF),8,"0")+s
Next n
ProcedureReturn s
;
Else
;
; special mode for debugging purposes, little endian with separation spaces, THIS USE MAY CHANGE
;
For n = 0 To Length-1
s = RSet(Bin(PeekB(addr+n) & $FF),8,"0")+s
If n < Length -1
s = " "+s
EndIf
Next n
ProcedureReturn s
;
EndIf
;
EndProcedure
z = 10000
Dim a.s(z)
;
sixtyfour.LARGE_INTEGER
QueryPerformanceFrequency_(@sixtyfour)
For n = 0 To z-1
QueryPerformanceCounter_(@sixtyfour)
a(n) = x_peekbin(@sixtyfour,8,2)
Next n
For n = 0 To z-1
Debug a(n)
Next n
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
-
- Enthusiast
- Posts: 149
- Joined: Wed Apr 27, 2005 11:50 am
- Location: Adelaide, Australia
- Contact:
Yeah I'm still trying to work this one out myself. I thought I had an answer until I realised I was over simplifying the problem. :roll: Vastly over simplifying. Mind you I haven't written a game in years - since the days with computers that ran at the same speed so you didn't have to worry about timing. I don't think GameMaker counts.
This problem needs to be cracked I think for all PB users who want to write games. I know I definitely need a solution instead of the hacks and band-aids I've been using. If I can come up with anything I'll definitely let you know.

the way to do this (i may need some assembly help here people) is reading that part of the high res timer that is useful, and ignore all bits below and above
1. we should be able to figure out how many bits we can drop from the frequency
2. then read it into a 64 bit block (8 bytes)
3. then shift it as many times left as need be
4. then do a peekl and we have a good high resolution counter in 32 bits that is good enough for game timing
could any of you machine code experts tell me how to shift a block of 8 bytes a bit to the right (or left)?
1. we should be able to figure out how many bits we can drop from the frequency
2. then read it into a 64 bit block (8 bytes)
3. then shift it as many times left as need be
4. then do a peekl and we have a good high resolution counter in 32 bits that is good enough for game timing
could any of you machine code experts tell me how to shift a block of 8 bytes a bit to the right (or left)?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
Okblueznl wrote:could any of you machine code experts tell me how to shift a block of 8 bytes a bit to the right (or left)?
Code: Select all
Structure Quad
lo.l
hi.l
EndStructure
Procedure ShiftQuadRight(*source.Quad, *dest.Quad, shift.l)
!MOV ebp, [esp];*source
!MOV eax, [ebp]
!MOV ebx, [ebp+4]
!MOV ecx, [esp+8];shift
!MOV ebp, [esp+4];*dest
!SHRD eax, ebx, cl
!SHR ebx, cl
!MOV [ebp], eax
!MOV [ebp+4], ebx
EndProcedure
a.Quad\lo = %10101010101010101010101010101010
a\hi = %11001100110011001100110011001111
Debug "%"+RSet(Bin(a\hi), 32, "0")+" "+RSet(Bin(a\lo), 32, "0")
ShiftQuadRight(@a, @b.Quad, 8)
Debug "%"+RSet(Bin(b\hi), 32, "0")+" "+RSet(Bin(b\lo), 32, "0")
MMX version 

Code: Select all
Procedure MMXShiftQuadRight(*source.Quad, *dest.Quad, shift.l)
!MOVD mm1, [esp+8]
!MOV eax, [esp]
!MOVQ mm0, [eax]
!PSRLQ mm0, mm1
!MOV eax, [esp+4]
!MOVQ [eax], mm0
!EMMS ; clear floating point tag word
EndProcedure
Last edited by El_Choni on Wed Jun 08, 2005 1:20 am, edited 1 time in total.
El_Choni
hey el choni, pupil, thx
euhm... it can operate on the same 64 bit block, ie. there are no different source and destinations...
would that be faster?


would that be faster?

( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )