Statistics functions?

Everything else that doesn't fall into one of the other PB categories.
Davy
User
User
Posts: 45
Joined: Wed Jun 13, 2012 7:43 pm

Statistics functions?

Post by Davy »

Is there any code available in the public domain for PB statistics functions? I haven't seen anything on the code archive site. e.g. mean, mode, variance, standard deviation, binomial distribution etc.

Thanks.
gnasen
Enthusiast
Enthusiast
Posts: 282
Joined: Wed Sep 24, 2008 12:21 am

Re: Statistics functions?

Post by gnasen »

I havent seen something like this here until now. Otherwise it shouldnt be that hard to implement it, as long as its only discrete stochastic.
pb 5.11
User avatar
STARGÅTE
Addict
Addict
Posts: 2315
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Statistics functions?

Post by STARGÅTE »

For float-lists I have something like this:

Code: Select all

Enumeration
	#Average_Arithmetic
	#Average_Geometric
	#Average_Harmonic
	#Average_Median
EndEnumeration

Procedure.f AverageList(List Float.f(), Mode.i=#Average_Arithmetic)
	Protected Average.f
	Select Mode
		Case #Average_Arithmetic
			Average = 0.0
			ForEach Float()
				Average + Float()
			Next
			Average = Average / ListSize(Float())
		Case #Average_Geometric
			Average = 1.0
			ForEach Float()
				Average * Float()
			Next
			Average = Pow(Average, 1/ListSize(Float()))
		Case #Average_Harmonic
			Average = 0.0
			ForEach Float()
				Average + 1/Float()
			Next
			Average = ListSize(Float()) / Average
		Case #Average_Median
			If ListSize(Float()) % 2
				SelectElement(Float(), (ListSize(Float())-1)/2)
				Average = Float()
			Else
				SelectElement(Float(), ListSize(Float())/2-1)
				Average = Float()/2
				NextElement(Float())
				Average + Float()/2
			EndIf
	EndSelect
	ProcedureReturn Average 
EndProcedure

NewList Test.f()

AddElement(Test()) : Test() = 1
AddElement(Test()) : Test() = 2
AddElement(Test()) : Test() = 3
AddElement(Test()) : Test() = 4
AddElement(Test()) : Test() = 7

Debug "Arithmetic: "+StrF(AverageList(Test(), #Average_Arithmetic))
Debug "Geometric:  "+StrF(AverageList(Test(), #Average_Geometric))
Debug "Harmonic:   "+StrF(AverageList(Test(), #Average_Harmonic))
Debug "Median:     "+StrF(AverageList(Test(), #Average_Median))
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
Little John
Addict
Addict
Posts: 4869
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Statistics functions?

Post by Little John »

Code: Select all

Procedure MeanAndVariance (Array x.d(1), *mean.Double, *variance.Double, last=-1)
   ; -- Fast calculation of Arithmetic Mean and Estimated Population Variance
   ; (only for *quantitative* data)
   ; in : x() : array of measured data
   ;      last: last element that is taken into account
   ; out: *mean, *variance
   Protected i.i, n.i
   Protected.d sum=0, sum_q=0

   If last = -1
      last = ArraySize(x())
   EndIf

   For i = 0 To last
      sum + x(i)
      sum_q + x(i)*x(i)
   Next

   n = last + 1
   *mean\d = sum / n
   *variance\d = (sum_q - *mean\d*sum) / (n-1)
EndProcedure

StandardDeviation = Sqr(Variance)
I have written more procedures in PB, but currently I don't have time to check them again, and to add necessary explanations or to translate existing comments to proper English. Sorry.

Regards, Little John
Davy
User
User
Posts: 45
Joined: Wed Jun 13, 2012 7:43 pm

Re: Statistics functions?

Post by Davy »

Thanks Little John and STARGATE. It shouldn't involve too much work to write the procedures I need, but maybe I'll put in a request to have some stats routines added to PB for future versions. :)
User avatar
skywalk
Addict
Addict
Posts: 4318
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Statistics functions?

Post by skywalk »

Some old code I use for stats of 1 dimensional arrays of Doubles and ignoring nulls if necessary.

Code: Select all

EnableExplicit
;-{ CONSTANTS
#small1     = 1e-16      ; small '1' to add to double calcs to round up or prevent overflows.
#nNull      = -999.00000000000000000
#nNullSTOP  = -999999.00000000000000
;- Array Math Operation Units
#arm_unitsNA      = 0
#arm_unitslinear  = 1
#arm_unitsLog     = 2
#arm_unitsLogPwr  = 3
;- Array Stats
#ars_nPts         = 0   ; nPts of array
#ars_min          = 1
#ars_max          = 2
#ars_range        = 3
#ars_sum          = 4
#ars_avg          = 5
#ars_stddev       = 6
#ars_SkipNull     = 7   ; nPts - nulls
#ars_pcstddev     = 8   ; %Stddev = Sigma/Avg*100
#ars_mnsigma      = 9   ; minus n Sigma
#ars_pnsigma      = 10  ; plus n Sigma
#ars_Skew         = 11  ; Skew of data set
#ars_Kurt         = 12  ; Kurtosis of data set
#ars_Xmonotonic   = 13
#ars_Ymonotonic   = 14
#ars_minp         = 15  ; Pointer to min
#ars_maxp         = 16  ; Pointer to max
#ars_pcrange      = 17  ; %range = range/Avg*100
#ars_nStats       = 18  ; Current number of Stats Fields
;-}
Macro UC_mW_dB(X_mW)
  ; Convert mW To dB
  Log10(X_mW) * 20.0
EndMacro

Macro UC_mW_dBm(X_mW)
  ; Convert mW To dBm
  Log10(X_mW) * 10.0
EndMacro

Macro UC_dB_LinMag(Y)
  Pow(10.0, ((y) / 20.0))
EndMacro

Macro UC_dBm_LinMag(Y)
  Pow(10.0, ((y) / 10.0))
EndMacro

Procedure.i arDelD(Array X.d(1), N.d=#nNull)
  ; Delete any N Values from the array, then redim the array
  Protected.i nPts
  nPts = ArraySize(x()) + 1
  If nPts > 1
    Protected.i r, k
    Protected Dim Xt.d(nPts - 1)
    k = 0
    For r = 0 To nPts - 1
      If X(r) <> N 
        Xt(k) = X(r)
        k = k + 1
      EndIf
    Next r
    ; Assign the reduced array to the calling array
    If k > 0
      ReDim Xt(k-1)
      ; copy array contents to X()
      CopyArray(Xt(), X())   ; Dim X() not req'd
    Else  ; no non-null pts found
      k = 1
      Dim X.d(0)
      X(0) = N
    EndIf
    ProcedureReturn k
  Else
    ProcedureReturn 0
  EndIf
EndProcedure

Procedure arFillD(Array x.d(1), nPts.i, N.d=#nNull)
  Protected.i r
  Dim x.d(nPts - 1)
  For r = 0 To nPts - 1
    x(r) = N
  Next r
EndProcedure

Procedure.i CT_LoadDataD(DataLabel.i, Array Y.D(1))
  ; REV:  110507, skywalk
  ;       Code tool to dynamically load sample data from a DataSection.
  ;       Change data type suffixes as required. ie; CT_LoadDataS
  Protected.i k
  ReDim Y.d(10)
  !MOV eax, [p.v_DataLabel]
  !MOV [PB_DataPointer], eax
  Repeat
    Read.d Y(k)
    If Y(k) <> #nNullSTOP ; End of List
      k + 1
    Else
      Break
    EndIf
    If (k % 10) = 0
      ReDim Y(k+10)
    EndIf
  ForEver
  ReDim Y(k-1)
  ProcedureReturn k
EndProcedure

Procedure.i ML_arrStats(Array x.d(1), Array arS.d(1), SkipNull.i=0, nSigma.d=3.0, isLog.i=0)
  ; REV:  040927, skywalk
  ;       Changed sub to function returning errcode if any, otherwise = 0
  ; REV:  110110, skywalk
  ;       Changed return value to skipnull, since PB does not return arrays()
  ;       arS(1) is passed ByRef, so edited contents will be seen globally.
  ; #ars_nPts         = 0   ; nPts of array
  ; #ars_min          = 1
  ; #ars_max          = 2
  ; #ars_range        = 3
  ; #ars_sum          = 4
  ; #ars_avg          = 5
  ; #ars_stddev       = 6
  ; #ars_SkipNull     = 7   ; nPts - nulls
  ; #ars_pcstddev     = 8   ; %Stddev = Sigma/Avg*100
  ; #ars_m3sigma      = 9   ; minus 3 Sigma
  ; #ars_p3sigma      = 10  ; plus 3 Sigma
  ; #ars_Skew         = 11  ; Skew of data set
  ; #ars_Kurt         = 12  ; Kurtosis of data set
  ; #ars_Xmonotonic   = 13
  ; #ars_Ymonotonic   = 14
  ; #ars_minp         = 15  ; Pointer to min
  ; #ars_maxp         = 16  ; Pointer to max
  ; #ars_pcrange      = 17  ; %range = range/Avg*100
  Protected.d SumSq, d
  Dim ars.d(#ars_nStats-1)
  arFillD(ars(), #ars_nStats) ; Pre-fill with nulls <> 0
  Protected.i i, nPts = ArraySize(X()) + 1
  ars(#ars_nPts) = nPts
  If SkipNull               ; delete all nulls 1st and then proceed w/non-null array
    Protected Dim Xn.d(nPts-1)
    CopyArray(x(), Xn())
    nPts = arDelD(Xn())     ; Del all nulls and recalc npts
    skipnull = nPts         ; Could = 0 if there were no actual nulls
    ars(#ars_SkipNull) = skipnull
  EndIf
  If isLog > #arm_unitslinear     ; Use Xn(), since log data will always ignore nulls.
    Dim xc.d(nPts - 1)            ; Use the Xc() convert array for log data
    If isLog = #arm_unitsLogPwr   ; Convert to linear before computing stats.
      For i = 0 To nPts - 1
        xc(i) = UC_dBm_LinMag(Xn(i))
      Next i
    ElseIf isLog = #arm_unitsLog  ; Convert to linear before computing stats.
      For i = 0 To nPts - 1
        xc(i) = UC_dB_LinMag(Xn(i))
      Next i
    EndIf
  EndIf
  If skipnull
    ars(#ars_min) = Xn(0)
  Else
    ars(#ars_min) = x(0)
  EndIf
  ars(#ars_minp) = 0
  ars(#ars_max) = ars(#ars_min)
  ars(#ars_sum) = 0
  SumSq = 0
  For i = 0 To nPts - 1
    If skipnull
      d = Xn(i)
    Else
      d = x(i)
    EndIf
    If d < ars(#ars_min)
      ars(#ars_min) = d
      ars(#ars_minp) = i
    EndIf
    If d > ars(#ars_max)
      ars(#ars_max) = d
      ars(#ars_maxp) = i
    EndIf
    If isLog > #arm_unitslinear
      ars(#ars_sum) + xc(i)
      SumSq + xc(i) * xc(i)
    Else
      ars(#ars_sum) + d
      SumSq + d * d
    EndIf
  Next i
  ars(#ars_range) = ars(#ars_max) - ars(#ars_min)
  ars(#ars_avg) = ars(#ars_sum) / nPts
  If nPts < 2     ; There are nPts samples. Samples >=2 for StdDev.
    ars(#ars_stddev) = 0
  Else
    ars(#ars_stddev) = (SumSq - nPts * ars(#ars_avg) * ars(#ars_avg)) / (nPts - 1)
    If ars(#ars_stddev) <= 0 
      ars(#ars_stddev) = #small1
    Else
      ars(#ars_stddev) = Sqr((SumSq - nPts * ars(#ars_avg) * ars(#ars_avg)) / (nPts - 1))
    EndIf
  EndIf
  ; avg, stddev are in linear at this point
  ars(#ars_pcstddev)    = ars(#ars_stddev) / (ars(#ars_avg) + #small1) * 100.0
  ars(#ars_mnsigma)     = ars(#ars_avg) - nSigma * ars(#ars_stddev)
  ars(#ars_pnsigma)     = ars(#ars_avg) + nSigma * ars(#ars_stddev)
  ars(#ars_pcrange)     = ars(#ars_range) / (ars(#ars_avg) + #small1) * 100.0
  If isLog = #arm_unitsLogPwr                  ; Convert back if necessary
    ars(#ars_pcstddev)  = UC_mW_dBm(ars(#ars_pcstddev))
    ars(#ars_mnsigma)   = UC_mW_dBm(ars(#ars_mnsigma))
    ars(#ars_pnsigma)   = UC_mW_dBm(ars(#ars_pnsigma))
    ars(#ars_pcrange)   = UC_dBm_LinMag(ars(#ars_range)) / (ars(#ars_avg) + #small1) * 100
    ars(#ars_pcrange)   = UC_mW_dBm(ars(#ars_pcrange))
    ars(#ars_avg)       = UC_mW_dBm(ars(#ars_avg))
    ars(#ars_stddev)    = UC_mW_dBm(ars(#ars_stddev))
  ElseIf isLog = #arm_unitsLog                 ; Convert back if necessary
    ars(#ars_pcstddev)  = UC_mW_dB(ars(#ars_pcstddev))
    ars(#ars_mnsigma)   = UC_mW_dB(ars(#ars_mnsigma))
    ars(#ars_pnsigma)   = UC_mW_dB(ars(#ars_pnsigma))
    ars(#ars_pcrange)   = UC_dB_LinMag(ars(#ars_range)) / (ars(#ars_avg) + #small1) * 100
    ars(#ars_pcrange)   = UC_mW_dB(ars(#ars_pcrange))
    ars(#ars_avg)       = UC_mW_dB(ars(#ars_avg))
    ars(#ars_stddev)    = UC_mW_dB(ars(#ars_stddev))
  EndIf
  ; TBD...
  ars(#ars_Skew) = #nNull
  ars(#ars_Kurt) = #nNull
  ars(#ars_Xmonotonic) = 0
  ars(#ars_Ymonotonic) = 0
  ProcedureReturn skipnull
EndProcedure
;-{ TEST
Dim X.d(0)
Dim Stats.d(0)
CT_LoadDataD(?Samples_D, X())
ML_arrStats(X(),Stats(),0)
Debug "; -- Stats[X()] --"
Debug "; nPts    = " + Str(Stats(#ars_nPts))
Debug "; noNulls = " + Str(Stats(#ars_SkipNull))
Debug "; Min     = " + StrD(Stats(#ars_min),2)
Debug "; Max     = " + StrD(Stats(#ars_max),2)
Debug "; Minp    = " + Str(Stats(#ars_minp))
Debug "; Maxp    = " + Str(Stats(#ars_maxp))
Debug "; Avg     = " + StrD(Stats(#ars_avg),2)
Debug "; Rng     = " + StrD(Stats(#ars_range),2)
Debug "; %Rng    = " + StrD(Stats(#ars_pcrange),2)
Debug "; Sum     = " + StrD(Stats(#ars_sum),2)
Debug "; Std     = " + StrD(Stats(#ars_stddev),2)
Debug "; -3Std   = " + StrD(Stats(#ars_mnsigma),2)
Debug "; +3Std   = " + StrD(Stats(#ars_pnsigma),2)
Debug "; %Std    = " + StrD(Stats(#ars_pcstddev),2)
ML_arrStats(X(),Stats(),1)
Debug "; -- Stats[X()], Nulls ignored --"
Debug "; nPts    = " + Str(Stats(#ars_nPts))
Debug "; noNulls = " + Str(Stats(#ars_SkipNull))
Debug "; Min     = " + StrD(Stats(#ars_min),2)
Debug "; Max     = " + StrD(Stats(#ars_max),2)
Debug "; Minp    = " + Str(Stats(#ars_minp))
Debug "; Maxp    = " + Str(Stats(#ars_maxp))
Debug "; Avg     = " + StrD(Stats(#ars_avg),2)
Debug "; Rng     = " + StrD(Stats(#ars_range),2)
Debug "; %Rng    = " + StrD(Stats(#ars_pcrange),2)
Debug "; Sum     = " + StrD(Stats(#ars_sum),2)
Debug "; Std     = " + StrD(Stats(#ars_stddev),2)
Debug "; -3Std   = " + StrD(Stats(#ars_mnsigma),2)
Debug "; +3Std   = " + StrD(Stats(#ars_pnsigma),2)
Debug "; %Std    = " + StrD(Stats(#ars_pcstddev),2)

DataSection
Samples_D:
  ; Append examples here...
  ;      Doubles
  Data.d 1      ; 0
  Data.d 1      ; 1
  Data.d 1      ; 2
  Data.d 1      ; 3
  Data.d 10     ; 4
  Data.d -999   ; 5 <-- Ignored if SkipNull=1
  Data.d 10     ; 6
  Data.d 1      ; 7
  Data.d -999   ; 8 <-- Ignored if SkipNull=1
  Data.d 1      ; 9
  Data.d #nNullSTOP
EndDataSection
;-}
; -- Stats[X()] --
; nPts    = 10
; noNulls = -999
; Min     = -999.00
; Max     = 10.00
; Minp    = 5
; Maxp    = 4
; Avg     = -197.20
; Rng     = 1009.00
; %Rng    = -511.66
; Sum     = -1972.00
; Std     = 422.60
; -3Std   = -1465.01
; +3Std   = 1070.61
; %Std    = -214.30
; -- Stats[X()], Nulls ignored --
; nPts    = 10
; noNulls = 8
; Min     = 1.00
; Max     = 10.00
; Minp    = 0
; Maxp    = 4
; Avg     = 3.25
; Rng     = 9.00
; %Rng    = 276.92
; Sum     = 26.00
; Std     = 4.17
; -3Std   = -9.25
; +3Std   = 15.75
; %Std    = 128.19
Last edited by skywalk on Thu Jul 12, 2012 6:23 am, edited 1 time in total.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
DarkDragon
Addict
Addict
Posts: 2348
Joined: Mon Jun 02, 2003 9:16 am
Location: Germany
Contact:

Re: Statistics functions?

Post by DarkDragon »

I made a principal component analysis which contains a few statistical functions. Just the row echelon form has one bug, maybe I'll fix it when I have enough time: http://www.purebasic.fr/english/viewtop ... 67#p329267
bye,
Daniel
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3944
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Statistics functions?

Post by wilbert »

Probably of no use but just for fun some x86 ASM :wink:
( It might make a difference if you have a lot of data to process )
To make it work on x64 each edx should be replaced with rdx .
It's based on this explanation http://www.mathsisfun.com/data/standard-deviation.html

Code: Select all

Structure dArrayInfo
  Sum.d
  Min.d
  Max.d
  Avg.d
  PopulationVariance.d
  PopulationStdDev.d
  SampleVariance.d
  SampleStdDev.d
EndStructure

Procedure DoubleArrayInfo(Array Values.d(1), *ArrayInfo.dArrayInfo)
  Protected.i ArrAddr = @Values(0)
  Protected.i ArrLen = ArraySize(Values()) + 1
  !mov ecx, [p.v_ArrLen]
  !mov edx, [p.v_ArrAddr]
  !pxor xmm0, xmm0
  !cvtsi2sd xmm2, ecx       ; xmm2 = Array length as double
  !mov eax, 0x7f800000
  !movd xmm3, eax           ; xmm3 = +Infinity (used for Min) 
  !cvtss2sd xmm3, xmm3
  !pxor xmm4, xmm4
  !subsd xmm4, xmm3         ; xmm4 = -Infinity (used for Max)
  !dArrayInfoLoop1:
  !movsd xmm1, [edx]
  !minsd xmm3, xmm1
  !maxsd xmm4, xmm1
  !addsd xmm0, xmm1
  !add edx, 8
  !dec ecx
  !jnz dArrayInfoLoop1
  !mov edx, [p.p_ArrayInfo]
  !movsd [edx], xmm0
  !movsd [edx + 8], xmm3
  !movsd [edx + 16], xmm4
  !divsd xmm0, xmm2
  !movsd [edx + 24], xmm0
  !mov ecx, [p.v_ArrLen]
  !mov edx, [p.v_ArrAddr]
  !movsd xmm3, xmm0         ; xmm3 = Avg
  !lea eax, [ecx - 1]
  !cvtsi2sd xmm4, eax       ; xmm4 = Array length - 1
  !pxor xmm0, xmm0
  !dArrayInfoLoop2:
  !movsd xmm1, [edx]
  !subsd xmm1, xmm3
  !mulsd xmm1, xmm1
  !addsd xmm0, xmm1
  !add edx, 8
  !dec ecx
  !jnz dArrayInfoLoop2
  !mov edx, [p.p_ArrayInfo]
  !movsd xmm1, xmm0
  !divsd xmm0, xmm2
  !movsd [edx + 32], xmm0
  !sqrtsd xmm0, xmm0
  !movsd [edx + 40], xmm0
  !divsd xmm1, xmm4
  !movsd [edx + 48], xmm1
  !sqrtsd xmm1, xmm1
  !movsd [edx + 56], xmm1
EndProcedure

Dim A.d(4)
A(0) = 600
A(1) = 470
A(2) = 170
A(3) = 430
A(4) = 300

DoubleArrayInfo(A(), ArrayInfo.dArrayInfo)

Info.s = "Sum = " + StrD(ArrayInfo\Sum) + #LF$
Info   + "Min = " + StrD(ArrayInfo\Min) + #LF$
Info   + "Max = " + StrD(ArrayInfo\Max) + #LF$
Info   + "Avg = " + StrD(ArrayInfo\Avg) + #LF$
Info   + "Population variance = " + StrD(ArrayInfo\PopulationVariance) + #LF$
Info   + "Population standard deviation = " + StrD(ArrayInfo\PopulationStdDev) + #LF$
Info   + "Sample variance = " + StrD(ArrayInfo\SampleVariance) + #LF$
Info   + "Sample standard deviation = " + StrD(ArrayInfo\SampleStdDev)

MessageRequester("Info", Info)
User avatar
blueb
Addict
Addict
Posts: 1125
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Re: Statistics functions?

Post by blueb »

Davy wrote:Is there any code available in the public domain for PB statistics functions? I haven't seen anything on the code archive site. e.g. mean, mode, variance, standard deviation, binomial distribution etc.

Thanks.
Checkout the "Rosetta Code" website. You'll find something at: http://rosettacode.org/wiki/Statistics/Basic#PureBasic

(Big thanks to Demivec and others for all their hard work) 8)

blueb
- It was too lonely at the top.

System : PB 6.21(x64) and Win 11 Pro (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
Post Reply