Hi Performance Counter with MicroSeconds

Share your advanced PureBasic knowledge/code with the community.
SMaag
Enthusiast
Enthusiast
Posts: 325
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Hi Performance Counter with MicroSeconds

Post by SMaag »

Because ElapsedMilliseconds() is not good enough in some circumstances, I searched for a better solution to get a higher resolution.
Like ElapsedMicroseconds(). I started to develop ElapsedMicroSeconds(). But I wanted a more comfortable function.

Here is the result of ElapsedMicroSeconds() and more comfortable HPT() HighPerformanceTimer

Updated: 2023/08/16 19:25
Updated: 2023/08/17 10:06, fixed Abs() Bug, PB's Abs is only for floats

Code: Select all

Procedure.q ElapsedMicroSeconds()
  Protected v.Quad, f.Quad
  
  QueryPerformanceFrequency_(f)
  QueryPerformanceCounter_(v)
  
  ProcedureReturn (v\q / (f\q /1e6) ) ; normalize value to MicroSeconds : 1e6 = 1.000.000 = 1Mhz
EndProcedure

#HPT_START = 0      ; START Timer and return RAW value of QueryPerformanceCounter [Ticks]
#HPT_STOP = 1       ; STOP Timer and return the time difference in MicroSeconds since start
#HPT_READ = 2       ; Read the time value [µs] (if Timer is stopped, it's the differntial time. If timer is started it is the START-Time)
#HPT_CALIBRATE = 3  ; Calibrate the timers (it captures the calling Time of HPT(), so timers will be automatically corrected by that ticks)
#HPT_GetAll = 4     ; #HPT_GetAll returns a Pointer to the TimerStructure THPT. Get a copy of all Timers with CopyStructure( HPT(#HPT_GetAll), *ReturnTmr.THPT, THPT) 

Procedure.q HPT(cmd=#HPT_START, TimerNo=0)
  
  Structure THPT    ; helper Structure to overlay as Array on Datasection
    T.QUAD[16]      ; [16] =[0..15] 
  EndStructure
  
  Static *HPT.THPT  ; Pointer to Datasection HPT_DATA:
  Static    k.Quad  ; Factor QueryPerformanceFrequency_ / 1 Mhz  =>  Factor Ticks to MicroSeconds  
  Static  cal.Quad  ; calibration ticks : Time for calling HPT() in Ticks of QueryPerformanceFrequency
  
  Protected v.Quad  ; actual value for QueryPerformanceCounter
    
  ; INIT
  If Not k\q  ; k must be 1 or greater! k is the factor of what QueryPerofrmanceFrequency is higher than 1Mhz 
    QueryPerformanceFrequency_(k)
    k\q  / 1e6          ; Frequ/1.000.000 => Factor Mhz, MicroSeconds
    If k\q < 1 
      k\q = 1           ; to prevent Errors if QueryPerofrmanceFrequency is less than 1MHz (should not happen!)
    EndIf 
    *HPT = ?HPT_DATA    ; set Pointer to HPT_DATA
  EndIf 
  
  Select cmd
      
    Case #HPT_START       ; START Timer --> save actual value of QueryPerformanceCounter in DataSection
      
      QueryPerformanceCounter_(*HPT\T[TimerNo])
      ProcedureReturn *HPT\T[TimerNo]\q             ; Return the saved value
      
    Case #HPT_STOP        ; STOP Timer -->  Difference of QueryPerformanceCounter and saved value as MicroSeconds
      
      QueryPerformanceCounter_(v) 
      
      *HPT\T[TimerNo]\q = ( (v\q - *HPT\T[TimerNo]\q)) -cal\q
      If *HPT\T[TimerNo]\q < 0        ; Abs() because PB's Abs() is for floats only
        *HPT\T[TimerNo]\q = - *HPT\T[TimerNo]\q 
      EndIf
      ProcedureReturn *HPT\T[TimerNo]\q  / k\q  ; Return MicroSeconds 
      
    Case #HPT_READ
      ProcedureReturn *HPT\T[TimerNo]\q / k\q
      
    Case #HPT_CALIBRATE         ; gets the Ticks for HPT() calls and save it!
      
      cal\q = HPT(#HPT_START)   ; with Parameter #HP_START we get back the RAW Ticks
      cal\q = (HPT(#HPT_START) - cal\q)
      If cal\q <0               ; Abs(cal\q) because PB's ABS() is for floats only
        cal\q = - cal\q
      EndIf
      ProcedureReturn cal\q     ; No of Ticks at QueryPerformanceFrequency
      
    Case #HPT_GetAll
      ProcedureReturn *HPT ; Return Pointer to TimerStructure to get access to all timers
       
  EndSelect
     
  DataSection
    HPT_DATA:   ; Reserve Space in DataSection for each Timer. Access a Timer with *HPT\T[No]
    ;      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15]; for mot Timers, add more values here
    Data.q  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
  EndDataSection
  
EndProcedure

; TEST - CODE 

Define uSec.q
uSec = ElapsedMicroSeconds()
Delay(10)
Debug "measure Delay(10)" 
Debug "Delay of 10ms :  measured µs = " + Str(ElapsedMicroSeconds() - uSec)   ; should be little more than 10.000
Debug #Null$

#cstLoops = 100000

Define I, sq.f
Define cal.q 

; calibrate the Timer if necessary. On my Ryzen 5800 this are 6..8 Ticks with Debugger and 0..2 without at 10Mhz QueryPerofrmanceFrequency
; if you do not need it that exact, calibration is not necessary!
cal = HPT(#HPT_CALIBRATE)   


; -------------------------------------------
;  Demo for fast use of Timer 0
; -------------------------------------------

HPT()                   ; Start Timer (0)
For I = 0 To #cstLoops
  sq = I*I
Next
Debug "Demo for fast use of Timer 0"
Debug HPT(#HPT_STOP)    ; Stop Timer(0) and Print Time

Debug #Null$

; -------------------------------------------
;  Demo for MultiTimer
; -------------------------------------------

HPT(#HPT_START, 7)  ; use Timer 7 as total time

; -------------------------------------------

HPT()             ; Start Timer (0)
For I = 0 To #cstLoops
  sq = Sqr(I)
Next
HPT(#HPT_STOP)    ; Stop Timer(0)

; -------------------------------------------

HPT(#HPT_START,1) ; Start Timer (1)
I=0
While I<= #cstLoops
  sq = Sqr(I)
  I + 1  
Wend
HPT(#HPT_STOP,1) ; Stop Timer (1)

; -------------------------------------------

HPT(#HPT_START,2) ; Start Timer (2)
I=0
Repeat
  sq = Sqr(I)
  I + 1    
Until I> #cstLoops
HPT(#HPT_STOP,2)  ; Start Timer (2)

; -------------------------------------------

HPT(#HPT_STOP,7)  ; Stop Timer (7), total time

; -------------------------------------------


Define txt.s
Define f.Quad 

QueryPerformanceFrequency_(f)   ; get the QueryPerofrmanceFrequency 
f\q / 1e6                       ; convert Frequency to MHz   

; -------------------------------------------

; DEMO how to get a BackupCopy of all Timers
; now we use HPT(#HPT_GetAll) to get the Pointer to the Timer Structure in DataSection
; with this Pointer we can get a copy of all TimerValues  

Define AllTimerValues.THPT    ; Define a variable for the timer copy
CopyStructure( HPT(#HPT_GetAll), AllTimerValues, THPT) ; copy the complete TimerStructre form HPT() DataSection 

Debug "Get All Timer values with HPT(#HPT_GetAll)"
For I = 0 To 15
  Debug Str(I) + " : " + Str(AllTimerValues\T[I]\q)
Next
; -------------------------------------------


txt = "For-Next     µs = " + Str(HPT(#HPT_READ)) + #CRLF$
txt + "While-Wend   µs = " + Str(HPT(#HPT_READ,1)) + #CRLF$
txt + "Repeat-Until µs = " + Str(HPT(#HPT_READ,2)) + #CRLF$
txt + "total time   µs = " + Str(HPT(#HPT_READ,7)) + #CRLF$
txt + "Calibration Ticks = " + Str(cal) + " of QueryPerformanceFrequency " + Str(f\q) + "MHz"

MessageRequester( "Timing Test for PB Loops with " + FormatNumber(#cstLoops, 0, "," , ".") + " Loops of Squre() Calculations", txt, #PB_MessageRequester_Info )

Last edited by SMaag on Thu Aug 17, 2023 9:10 am, edited 3 times in total.
Rinzwind
Enthusiast
Enthusiast
Posts: 690
Joined: Wed Mar 11, 2009 4:06 pm
Location: NL

Re: Hi Performance Counter with MicroSeconds

Post by Rinzwind »

Interesting piece of code. Thanks.
User avatar
jacdelad
Addict
Addict
Posts: 2010
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: Hi Performance Counter with MicroSeconds

Post by jacdelad »

Nice. Windows-only, I assume.
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
GoodNPlenty
Enthusiast
Enthusiast
Posts: 112
Joined: Wed May 13, 2009 8:38 am
Location: Arizona, USA

Re: Hi Performance Counter with MicroSeconds

Post by GoodNPlenty »

Excellent work. Thank You for sharing. :D
SMaag
Enthusiast
Enthusiast
Posts: 325
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: Hi Performance Counter with MicroSeconds

Post by SMaag »

Yes, it is Windows only! Until one of us posts a code how to use High Performance Counter on Linux!
On Linux it 's a harder work! See description of Linux timing here!
https://man7.org/linux/man-pages/man7/time.7.html

The big problem of HighPerformance counter is: We have to use OS Functions because reading of the Processor specific registers
is limited to Ring0 what's OS level.

We can read the TSC TimeStampCounter at User Level, but we get the ticks of the Processor at the actual frequency. The frequency is not
fix and change with CPU load! I didn't find a methode to get the correct frequency (it's only possible in Ring 0).

here is how to read the TSC with ASM-Code

Code: Select all

Procedure.q Read_TSC()
  ; ReadTimeStampCounter
  ; Problem: we get the ticks at actal CPU frequency what changes up and down according to CPU load!
  ; RDTSC copy the ticks into EAX,EDX
  !RDTSC
  !SHL RDX, 32
  !ADD RAX, RDX
  ProcedureReturn ; return the TimeStampCounter
EndProcedure
User avatar
skywalk
Addict
Addict
Posts: 4218
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Hi Performance Counter with MicroSeconds

Post by skywalk »

original post
Fred wrote: Thu Jan 26, 2017 9:24 am ElapsedMilliseconds() should be precise on all OS now as we use PerformanceCounter on Windows and gettimeofday() on Linux/OSX which both have microsecond support. But yes, non of these OS are realtime, so you could approach precise timing but not reach it
Is this incorrect?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
idle
Always Here
Always Here
Posts: 5899
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Hi Performance Counter with MicroSeconds

Post by idle »

I did a time profile macros here, it gets an estimate of the cpu frequency
viewtopic.php?p=597586#p597586

Code: Select all

;Time profiler   
;Author Idle 
;PB LTS 6.01 x86,x64 asm and c backend 
;v1.02a
;Notes Cycle counts are indicative of relative performance and times are approximate  
;If the time cpu time stamp rolls over it subtracts from the count and adds to the err  
;compile As console
#ENABLE_PROFILE=1   ;Enables profiler or to disable ENABLE_PROFILE=0   
CompilerIf #ENABLE_PROFILE=1
  CompilerIf Defined(PB_Compiler_Backend,#PB_Constant) 
    CompilerIf #PB_Compiler_Backend=#PB_Backend_C
      CompilerIf #PB_Compiler_Processor = #PB_Processor_x64 Or #PB_Compiler_Processor = #PB_Processor_x86 
        Procedure.q GetRDTSC()
          Protected t.q
          !unsigned hi, lo;
          !__asm__ __volatile__ ("lfence\n rdtsc\n lfence" : "=a"(lo), "=d"(hi));
          !v_t =((unsigned long long)lo)|(((unsigned long long)hi)<<32 );
          ProcedureReturn t
        EndProcedure
      CompilerElseIf #PB_Compiler_Processor = #PB_Processor_Arm64 Or #PB_Compiler_Processor = #PB_Processor_Arm32 
        Procedure.q GetRDTSC()
        Protected pmuseren.l,pmcntenset.l,pmccntr.l;
        !asm volatile("mrc p15, 0, %0, c9, c14, 0" : "=r"(v_pmuseren));
        If pmuseren & 1 
          !asm volatile("mrc p15, 0, %0, c9, c12, 1" : "=r"(v_pmcntenset));
          If pmcntenset & $80000000 
            !asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(v_pmccntr));
            t = pmccntr
            ProcedureReturn t << 6 
          EndIf 
        EndIf 
        EndProcedure 
     CompilerEndIf 
    CompilerElse
      Procedure.q GetRDTSC()
        Protected hi.q,lo.q,res.q 
        DisableDebugger
        !lfence 
        !rdtsc
        !lfence
        !mov [p.v_hi], edx
        !mov [p.v_lo], eax 
        res = hi
        res << 32
        res | lo
        ProcedureReturn res
        EnableDebugger 
      EndProcedure
    CompilerEndIf
  CompilerElse
    Procedure.q GetRDTSC()
      Protected hi.q,lo.q,res.q 
      DisableDebugger
      !lfence 
      !rdtsc
      !lfence
      !mov [p.v_hi], edx
      !mov [p.v_lo], eax 
      res = hi
      res << 32
      res | lo
      EnableDebugger 
    EndProcedure
  CompilerEndIf  
  
  Procedure.q GetFrequency() 
    Protected t1.q,t2.q,result 
    Repeat     
      t1 = GetRDTSC()
      Delay(128) 
      t2 = GetRDTSC()
      If t2 - t1 > 0 
        result = ((t2-t1) / (128)) 
        Break  
      EndIf    
    ForEver 
    ProcedureReturn result 
  EndProcedure   
  
  Structure profile 
    count.i
    err.i
    sumtime.q 
    st.q
    et.q 
  EndStructure 
  
  Global NewMap prof.profile() 
  Global prof_msfreq.i = GetFrequency() 
  
  Macro BeginProfile(name) 
    prof(name)\count + 1 
    prof(name)\st = GetRDTSC() 
  EndMacro 
  
  Macro EndProfile(name) 
    prof(name)\et = GetRDTSC() 
    If prof(name)\et > prof(name)\st 
      prof()\sumtime + prof(name)\et - prof(name)\st
    Else  
      prof()\count - 1    ;ignore rollover if st > et    
      prof()\err + 1      ;indicates a timeing error     
    EndIf   
  EndMacro   
  
  Macro PrintProfile() 
    PrintN("CPU Frequency " + FormatNumber(prof_msfreq/1000000.0,2,".",",") + " ghz") 
    ForEach prof()  
      If prof()\count 
        PrintN(MapKey(prof())  + " sum time " + FormatNumber(prof()\sumtime / prof_msfreq ,0,".",",") + " ms avg cycles " + FormatNumber(prof()\sumtime / prof()\count,0,".",",") + " " + Str(prof()\err )) 
      EndIf   
    Next 
  EndMacro    
CompilerElse 
  Macro BeginProfile(name) : EndMacro 
  Macro EndProfile(name) : EndMacro 
  Macro PrintProfile() : EndMacro 
CompilerEndIf   

CompilerIf #PB_Compiler_IsMainFile 
  ;example 
  Procedure bar() 
   
    Protected b 
    BeginProfile(#PB_Compiler_Procedure)  ;Add BeginProfile / EndProfile blocks to code you want to time it will only compile if #ENABLE_PROFILE=1 is declared  
    For a = 1 To 50000 
      b*a  
    Next   
    EndProfile(#PB_Compiler_Procedure)     
  EndProcedure   
  
  Procedure foo() 
    Protected b 
    BeginProfile(#PB_Compiler_Procedure) 
    For a = 1 To 5000 
      b * 2 
    Next 
    EndProfile(#PB_Compiler_Procedure) 
  EndProcedure   
  
  OpenConsole()
  
  BeginProfile("main") 
  st = ElapsedMilliseconds() 
  For a = 1 To 500 
    foo()
    bar()
  Next 
  et = ElapsedMilliseconds() 
  EndProfile("main") 
  
  PrintProfile() ; dump profile results           
  
  PrintN("main elapsed ms check " + Str(et-st)) 
  PrintN("press enter key to end") 
  Input() 
  
CompilerEndIf 

Fred
Administrator
Administrator
Posts: 18224
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Hi Performance Counter with MicroSeconds

Post by Fred »

skywalk wrote: Wed Aug 16, 2023 8:42 pm original post
Fred wrote: Thu Jan 26, 2017 9:24 am ElapsedMilliseconds() should be precise on all OS now as we use PerformanceCounter on Windows and gettimeofday() on Linux/OSX which both have microsecond support. But yes, non of these OS are realtime, so you could approach precise timing but not reach it
Is this incorrect?
It's correct, but you don't have the microseconds part as only milliseconds are returned.
SMaag
Enthusiast
Enthusiast
Posts: 325
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: Hi Performance Counter with MicroSeconds

Post by SMaag »

Now I found the following for Linux
https://man7.org/linux/man-pages/man3/c ... ime.3.html

Since Linux Kernel 2.6 there are functions for high resolution time

but I don' know how to call Linux Kernel functions in PB.
Maybe anyone out there who can add the Linux System calls.
In C all is defined in time.h

Code: Select all

/* Get resolution of clock CLOCK_ID.  */
extern int clock_getres (clockid_t __clock_id, struct timespec *__res) __THROW;
/* Get current value of clock CLOCK_ID and store it in TP.  */
extern int clock_gettime (clockid_t __clock_id, struct timespec *__tp) __THROW;
/* Set clock CLOCK_ID to value TP.  */
extern int clock_settime (clockid_t __clock_id, const struct timespec *__tp)    __THROW;

first PseudoCode for GetTime on Linux.

Code: Select all

Structure timespec
  time_t.l      ; tv_sec long int
  tn_nsec.l     ; long int
EndStructure

Prototype clock_getres(clock_id, ts.timespec)    ; Part of Linux Kernel 
Prototype clock_gettime(clock_id, ts.timespec)  ; Part of Linux Kernel 

#CLOCK_REALTIME = 0           ; Identifier for system-wide realtime clock.
#CLOCK_MONOTONIC =1           ; Monotonic system-wide clock
#CLOCK_PROCESS_CPUTIME_ID = 2 ; High-resolution timer from the CPU
#CLOCK_THREAD_CPUTIME_ID = 3  ; Thread-specific CPU-time clock
#CLOCK_MONOTONIC_RAW = 4      ; Monotonic system-wide clock, Not adjusted For frequency scaling
#CLOCK_REALTIME_COARSE = 5    ; Identifier For system-wide realtime clock, updated only on ticks
#CLOCK_MONOTONIC_COARSE = 6   ; Monotonic system-wide clock, updated only on ticks
#CLOCK_BOOTTIME =7            ; Monotonic system-wide clock that includes time spent in suspension
#CLOCK_REALTIME_ALARM = 8     ; Like CLOCK_REALTIME but also wakes suspended system
#CLOCK_BOOTTIME_ALARM = 9     ; Like CLOCK_BOOTTIME but also wakes suspended system
#CLOCK_TAI = 11               ; Like CLOCK_REALTIME but in International Atomic Time

Define tm_res.timespec
Define tm.timespec

clock_gettime(#CLOCK_REALTIME, tm)
clock_getres((#CLOCK_REALTIME, tm_res)

User avatar
skywalk
Addict
Addict
Posts: 4218
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Hi Performance Counter with MicroSeconds

Post by skywalk »

Yup, makes sense.
Fred wrote: It's correct, but you don't have the microseconds part as only milliseconds are returned.
Is it time to add elapsedmicroseconds()?
I also must use a small timer lib for usec resolution.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
tatanas
Enthusiast
Enthusiast
Posts: 260
Joined: Wed Nov 06, 2019 10:28 am
Location: France

Re: Hi Performance Counter with MicroSeconds

Post by tatanas »

I posted on a similar subject too (microsecond delay() for linux). I didn't find this one when looking around on the forum...

If the actual ElapsedMilliseconds() function can return microsecond why not create, as skywalk said, a ElapsedMicroseconds() ?
Is there some problem implementing it ?
Windows 10 Pro x64
PureBasic 6.20 x64
Post Reply