Page 1 of 1

Statistics functions?

Posted: Wed Jul 11, 2012 3:30 pm
by Davy
Is there any code available in the public domain for PB statistics functions? I haven't seen anything on the code archive site. e.g. mean, mode, variance, standard deviation, binomial distribution etc.

Thanks.

Re: Statistics functions?

Posted: Wed Jul 11, 2012 3:33 pm
by gnasen
I havent seen something like this here until now. Otherwise it shouldnt be that hard to implement it, as long as its only discrete stochastic.

Re: Statistics functions?

Posted: Wed Jul 11, 2012 4:12 pm
by STARGĂ…TE
For float-lists I have something like this:

Code: Select all

Enumeration
	#Average_Arithmetic
	#Average_Geometric
	#Average_Harmonic
	#Average_Median
EndEnumeration

Procedure.f AverageList(List Float.f(), Mode.i=#Average_Arithmetic)
	Protected Average.f
	Select Mode
		Case #Average_Arithmetic
			Average = 0.0
			ForEach Float()
				Average + Float()
			Next
			Average = Average / ListSize(Float())
		Case #Average_Geometric
			Average = 1.0
			ForEach Float()
				Average * Float()
			Next
			Average = Pow(Average, 1/ListSize(Float()))
		Case #Average_Harmonic
			Average = 0.0
			ForEach Float()
				Average + 1/Float()
			Next
			Average = ListSize(Float()) / Average
		Case #Average_Median
			If ListSize(Float()) % 2
				SelectElement(Float(), (ListSize(Float())-1)/2)
				Average = Float()
			Else
				SelectElement(Float(), ListSize(Float())/2-1)
				Average = Float()/2
				NextElement(Float())
				Average + Float()/2
			EndIf
	EndSelect
	ProcedureReturn Average 
EndProcedure

NewList Test.f()

AddElement(Test()) : Test() = 1
AddElement(Test()) : Test() = 2
AddElement(Test()) : Test() = 3
AddElement(Test()) : Test() = 4
AddElement(Test()) : Test() = 7

Debug "Arithmetic: "+StrF(AverageList(Test(), #Average_Arithmetic))
Debug "Geometric:  "+StrF(AverageList(Test(), #Average_Geometric))
Debug "Harmonic:   "+StrF(AverageList(Test(), #Average_Harmonic))
Debug "Median:     "+StrF(AverageList(Test(), #Average_Median))

Re: Statistics functions?

Posted: Wed Jul 11, 2012 4:32 pm
by Little John

Code: Select all

Procedure MeanAndVariance (Array x.d(1), *mean.Double, *variance.Double, last=-1)
   ; -- Fast calculation of Arithmetic Mean and Estimated Population Variance
   ; (only for *quantitative* data)
   ; in : x() : array of measured data
   ;      last: last element that is taken into account
   ; out: *mean, *variance
   Protected i.i, n.i
   Protected.d sum=0, sum_q=0

   If last = -1
      last = ArraySize(x())
   EndIf

   For i = 0 To last
      sum + x(i)
      sum_q + x(i)*x(i)
   Next

   n = last + 1
   *mean\d = sum / n
   *variance\d = (sum_q - *mean\d*sum) / (n-1)
EndProcedure

StandardDeviation = Sqr(Variance)
I have written more procedures in PB, but currently I don't have time to check them again, and to add necessary explanations or to translate existing comments to proper English. Sorry.

Regards, Little John

Re: Statistics functions?

Posted: Wed Jul 11, 2012 6:31 pm
by Davy
Thanks Little John and STARGATE. It shouldn't involve too much work to write the procedures I need, but maybe I'll put in a request to have some stats routines added to PB for future versions. :)

Re: Statistics functions?

Posted: Wed Jul 11, 2012 6:34 pm
by skywalk
Some old code I use for stats of 1 dimensional arrays of Doubles and ignoring nulls if necessary.

Code: Select all

EnableExplicit
;-{ CONSTANTS
#small1     = 1e-16      ; small '1' to add to double calcs to round up or prevent overflows.
#nNull      = -999.00000000000000000
#nNullSTOP  = -999999.00000000000000
;- Array Math Operation Units
#arm_unitsNA      = 0
#arm_unitslinear  = 1
#arm_unitsLog     = 2
#arm_unitsLogPwr  = 3
;- Array Stats
#ars_nPts         = 0   ; nPts of array
#ars_min          = 1
#ars_max          = 2
#ars_range        = 3
#ars_sum          = 4
#ars_avg          = 5
#ars_stddev       = 6
#ars_SkipNull     = 7   ; nPts - nulls
#ars_pcstddev     = 8   ; %Stddev = Sigma/Avg*100
#ars_mnsigma      = 9   ; minus n Sigma
#ars_pnsigma      = 10  ; plus n Sigma
#ars_Skew         = 11  ; Skew of data set
#ars_Kurt         = 12  ; Kurtosis of data set
#ars_Xmonotonic   = 13
#ars_Ymonotonic   = 14
#ars_minp         = 15  ; Pointer to min
#ars_maxp         = 16  ; Pointer to max
#ars_pcrange      = 17  ; %range = range/Avg*100
#ars_nStats       = 18  ; Current number of Stats Fields
;-}
Macro UC_mW_dB(X_mW)
  ; Convert mW To dB
  Log10(X_mW) * 20.0
EndMacro

Macro UC_mW_dBm(X_mW)
  ; Convert mW To dBm
  Log10(X_mW) * 10.0
EndMacro

Macro UC_dB_LinMag(Y)
  Pow(10.0, ((y) / 20.0))
EndMacro

Macro UC_dBm_LinMag(Y)
  Pow(10.0, ((y) / 10.0))
EndMacro

Procedure.i arDelD(Array X.d(1), N.d=#nNull)
  ; Delete any N Values from the array, then redim the array
  Protected.i nPts
  nPts = ArraySize(x()) + 1
  If nPts > 1
    Protected.i r, k
    Protected Dim Xt.d(nPts - 1)
    k = 0
    For r = 0 To nPts - 1
      If X(r) <> N 
        Xt(k) = X(r)
        k = k + 1
      EndIf
    Next r
    ; Assign the reduced array to the calling array
    If k > 0
      ReDim Xt(k-1)
      ; copy array contents to X()
      CopyArray(Xt(), X())   ; Dim X() not req'd
    Else  ; no non-null pts found
      k = 1
      Dim X.d(0)
      X(0) = N
    EndIf
    ProcedureReturn k
  Else
    ProcedureReturn 0
  EndIf
EndProcedure

Procedure arFillD(Array x.d(1), nPts.i, N.d=#nNull)
  Protected.i r
  Dim x.d(nPts - 1)
  For r = 0 To nPts - 1
    x(r) = N
  Next r
EndProcedure

Procedure.i CT_LoadDataD(DataLabel.i, Array Y.D(1))
  ; REV:  110507, skywalk
  ;       Code tool to dynamically load sample data from a DataSection.
  ;       Change data type suffixes as required. ie; CT_LoadDataS
  Protected.i k
  ReDim Y.d(10)
  !MOV eax, [p.v_DataLabel]
  !MOV [PB_DataPointer], eax
  Repeat
    Read.d Y(k)
    If Y(k) <> #nNullSTOP ; End of List
      k + 1
    Else
      Break
    EndIf
    If (k % 10) = 0
      ReDim Y(k+10)
    EndIf
  ForEver
  ReDim Y(k-1)
  ProcedureReturn k
EndProcedure

Procedure.i ML_arrStats(Array x.d(1), Array arS.d(1), SkipNull.i=0, nSigma.d=3.0, isLog.i=0)
  ; REV:  040927, skywalk
  ;       Changed sub to function returning errcode if any, otherwise = 0
  ; REV:  110110, skywalk
  ;       Changed return value to skipnull, since PB does not return arrays()
  ;       arS(1) is passed ByRef, so edited contents will be seen globally.
  ; #ars_nPts         = 0   ; nPts of array
  ; #ars_min          = 1
  ; #ars_max          = 2
  ; #ars_range        = 3
  ; #ars_sum          = 4
  ; #ars_avg          = 5
  ; #ars_stddev       = 6
  ; #ars_SkipNull     = 7   ; nPts - nulls
  ; #ars_pcstddev     = 8   ; %Stddev = Sigma/Avg*100
  ; #ars_m3sigma      = 9   ; minus 3 Sigma
  ; #ars_p3sigma      = 10  ; plus 3 Sigma
  ; #ars_Skew         = 11  ; Skew of data set
  ; #ars_Kurt         = 12  ; Kurtosis of data set
  ; #ars_Xmonotonic   = 13
  ; #ars_Ymonotonic   = 14
  ; #ars_minp         = 15  ; Pointer to min
  ; #ars_maxp         = 16  ; Pointer to max
  ; #ars_pcrange      = 17  ; %range = range/Avg*100
  Protected.d SumSq, d
  Dim ars.d(#ars_nStats-1)
  arFillD(ars(), #ars_nStats) ; Pre-fill with nulls <> 0
  Protected.i i, nPts = ArraySize(X()) + 1
  ars(#ars_nPts) = nPts
  If SkipNull               ; delete all nulls 1st and then proceed w/non-null array
    Protected Dim Xn.d(nPts-1)
    CopyArray(x(), Xn())
    nPts = arDelD(Xn())     ; Del all nulls and recalc npts
    skipnull = nPts         ; Could = 0 if there were no actual nulls
    ars(#ars_SkipNull) = skipnull
  EndIf
  If isLog > #arm_unitslinear     ; Use Xn(), since log data will always ignore nulls.
    Dim xc.d(nPts - 1)            ; Use the Xc() convert array for log data
    If isLog = #arm_unitsLogPwr   ; Convert to linear before computing stats.
      For i = 0 To nPts - 1
        xc(i) = UC_dBm_LinMag(Xn(i))
      Next i
    ElseIf isLog = #arm_unitsLog  ; Convert to linear before computing stats.
      For i = 0 To nPts - 1
        xc(i) = UC_dB_LinMag(Xn(i))
      Next i
    EndIf
  EndIf
  If skipnull
    ars(#ars_min) = Xn(0)
  Else
    ars(#ars_min) = x(0)
  EndIf
  ars(#ars_minp) = 0
  ars(#ars_max) = ars(#ars_min)
  ars(#ars_sum) = 0
  SumSq = 0
  For i = 0 To nPts - 1
    If skipnull
      d = Xn(i)
    Else
      d = x(i)
    EndIf
    If d < ars(#ars_min)
      ars(#ars_min) = d
      ars(#ars_minp) = i
    EndIf
    If d > ars(#ars_max)
      ars(#ars_max) = d
      ars(#ars_maxp) = i
    EndIf
    If isLog > #arm_unitslinear
      ars(#ars_sum) + xc(i)
      SumSq + xc(i) * xc(i)
    Else
      ars(#ars_sum) + d
      SumSq + d * d
    EndIf
  Next i
  ars(#ars_range) = ars(#ars_max) - ars(#ars_min)
  ars(#ars_avg) = ars(#ars_sum) / nPts
  If nPts < 2     ; There are nPts samples. Samples >=2 for StdDev.
    ars(#ars_stddev) = 0
  Else
    ars(#ars_stddev) = (SumSq - nPts * ars(#ars_avg) * ars(#ars_avg)) / (nPts - 1)
    If ars(#ars_stddev) <= 0 
      ars(#ars_stddev) = #small1
    Else
      ars(#ars_stddev) = Sqr((SumSq - nPts * ars(#ars_avg) * ars(#ars_avg)) / (nPts - 1))
    EndIf
  EndIf
  ; avg, stddev are in linear at this point
  ars(#ars_pcstddev)    = ars(#ars_stddev) / (ars(#ars_avg) + #small1) * 100.0
  ars(#ars_mnsigma)     = ars(#ars_avg) - nSigma * ars(#ars_stddev)
  ars(#ars_pnsigma)     = ars(#ars_avg) + nSigma * ars(#ars_stddev)
  ars(#ars_pcrange)     = ars(#ars_range) / (ars(#ars_avg) + #small1) * 100.0
  If isLog = #arm_unitsLogPwr                  ; Convert back if necessary
    ars(#ars_pcstddev)  = UC_mW_dBm(ars(#ars_pcstddev))
    ars(#ars_mnsigma)   = UC_mW_dBm(ars(#ars_mnsigma))
    ars(#ars_pnsigma)   = UC_mW_dBm(ars(#ars_pnsigma))
    ars(#ars_pcrange)   = UC_dBm_LinMag(ars(#ars_range)) / (ars(#ars_avg) + #small1) * 100
    ars(#ars_pcrange)   = UC_mW_dBm(ars(#ars_pcrange))
    ars(#ars_avg)       = UC_mW_dBm(ars(#ars_avg))
    ars(#ars_stddev)    = UC_mW_dBm(ars(#ars_stddev))
  ElseIf isLog = #arm_unitsLog                 ; Convert back if necessary
    ars(#ars_pcstddev)  = UC_mW_dB(ars(#ars_pcstddev))
    ars(#ars_mnsigma)   = UC_mW_dB(ars(#ars_mnsigma))
    ars(#ars_pnsigma)   = UC_mW_dB(ars(#ars_pnsigma))
    ars(#ars_pcrange)   = UC_dB_LinMag(ars(#ars_range)) / (ars(#ars_avg) + #small1) * 100
    ars(#ars_pcrange)   = UC_mW_dB(ars(#ars_pcrange))
    ars(#ars_avg)       = UC_mW_dB(ars(#ars_avg))
    ars(#ars_stddev)    = UC_mW_dB(ars(#ars_stddev))
  EndIf
  ; TBD...
  ars(#ars_Skew) = #nNull
  ars(#ars_Kurt) = #nNull
  ars(#ars_Xmonotonic) = 0
  ars(#ars_Ymonotonic) = 0
  ProcedureReturn skipnull
EndProcedure
;-{ TEST
Dim X.d(0)
Dim Stats.d(0)
CT_LoadDataD(?Samples_D, X())
ML_arrStats(X(),Stats(),0)
Debug "; -- Stats[X()] --"
Debug "; nPts    = " + Str(Stats(#ars_nPts))
Debug "; noNulls = " + Str(Stats(#ars_SkipNull))
Debug "; Min     = " + StrD(Stats(#ars_min),2)
Debug "; Max     = " + StrD(Stats(#ars_max),2)
Debug "; Minp    = " + Str(Stats(#ars_minp))
Debug "; Maxp    = " + Str(Stats(#ars_maxp))
Debug "; Avg     = " + StrD(Stats(#ars_avg),2)
Debug "; Rng     = " + StrD(Stats(#ars_range),2)
Debug "; %Rng    = " + StrD(Stats(#ars_pcrange),2)
Debug "; Sum     = " + StrD(Stats(#ars_sum),2)
Debug "; Std     = " + StrD(Stats(#ars_stddev),2)
Debug "; -3Std   = " + StrD(Stats(#ars_mnsigma),2)
Debug "; +3Std   = " + StrD(Stats(#ars_pnsigma),2)
Debug "; %Std    = " + StrD(Stats(#ars_pcstddev),2)
ML_arrStats(X(),Stats(),1)
Debug "; -- Stats[X()], Nulls ignored --"
Debug "; nPts    = " + Str(Stats(#ars_nPts))
Debug "; noNulls = " + Str(Stats(#ars_SkipNull))
Debug "; Min     = " + StrD(Stats(#ars_min),2)
Debug "; Max     = " + StrD(Stats(#ars_max),2)
Debug "; Minp    = " + Str(Stats(#ars_minp))
Debug "; Maxp    = " + Str(Stats(#ars_maxp))
Debug "; Avg     = " + StrD(Stats(#ars_avg),2)
Debug "; Rng     = " + StrD(Stats(#ars_range),2)
Debug "; %Rng    = " + StrD(Stats(#ars_pcrange),2)
Debug "; Sum     = " + StrD(Stats(#ars_sum),2)
Debug "; Std     = " + StrD(Stats(#ars_stddev),2)
Debug "; -3Std   = " + StrD(Stats(#ars_mnsigma),2)
Debug "; +3Std   = " + StrD(Stats(#ars_pnsigma),2)
Debug "; %Std    = " + StrD(Stats(#ars_pcstddev),2)

DataSection
Samples_D:
  ; Append examples here...
  ;      Doubles
  Data.d 1      ; 0
  Data.d 1      ; 1
  Data.d 1      ; 2
  Data.d 1      ; 3
  Data.d 10     ; 4
  Data.d -999   ; 5 <-- Ignored if SkipNull=1
  Data.d 10     ; 6
  Data.d 1      ; 7
  Data.d -999   ; 8 <-- Ignored if SkipNull=1
  Data.d 1      ; 9
  Data.d #nNullSTOP
EndDataSection
;-}
; -- Stats[X()] --
; nPts    = 10
; noNulls = -999
; Min     = -999.00
; Max     = 10.00
; Minp    = 5
; Maxp    = 4
; Avg     = -197.20
; Rng     = 1009.00
; %Rng    = -511.66
; Sum     = -1972.00
; Std     = 422.60
; -3Std   = -1465.01
; +3Std   = 1070.61
; %Std    = -214.30
; -- Stats[X()], Nulls ignored --
; nPts    = 10
; noNulls = 8
; Min     = 1.00
; Max     = 10.00
; Minp    = 0
; Maxp    = 4
; Avg     = 3.25
; Rng     = 9.00
; %Rng    = 276.92
; Sum     = 26.00
; Std     = 4.17
; -3Std   = -9.25
; +3Std   = 15.75
; %Std    = 128.19

Re: Statistics functions?

Posted: Thu Jul 12, 2012 5:55 am
by DarkDragon
I made a principal component analysis which contains a few statistical functions. Just the row echelon form has one bug, maybe I'll fix it when I have enough time: http://www.purebasic.fr/english/viewtop ... 67#p329267

Re: Statistics functions?

Posted: Thu Jul 12, 2012 7:58 am
by wilbert
Probably of no use but just for fun some x86 ASM :wink:
( It might make a difference if you have a lot of data to process )
To make it work on x64 each edx should be replaced with rdx .
It's based on this explanation http://www.mathsisfun.com/data/standard-deviation.html

Code: Select all

Structure dArrayInfo
  Sum.d
  Min.d
  Max.d
  Avg.d
  PopulationVariance.d
  PopulationStdDev.d
  SampleVariance.d
  SampleStdDev.d
EndStructure

Procedure DoubleArrayInfo(Array Values.d(1), *ArrayInfo.dArrayInfo)
  Protected.i ArrAddr = @Values(0)
  Protected.i ArrLen = ArraySize(Values()) + 1
  !mov ecx, [p.v_ArrLen]
  !mov edx, [p.v_ArrAddr]
  !pxor xmm0, xmm0
  !cvtsi2sd xmm2, ecx       ; xmm2 = Array length as double
  !mov eax, 0x7f800000
  !movd xmm3, eax           ; xmm3 = +Infinity (used for Min) 
  !cvtss2sd xmm3, xmm3
  !pxor xmm4, xmm4
  !subsd xmm4, xmm3         ; xmm4 = -Infinity (used for Max)
  !dArrayInfoLoop1:
  !movsd xmm1, [edx]
  !minsd xmm3, xmm1
  !maxsd xmm4, xmm1
  !addsd xmm0, xmm1
  !add edx, 8
  !dec ecx
  !jnz dArrayInfoLoop1
  !mov edx, [p.p_ArrayInfo]
  !movsd [edx], xmm0
  !movsd [edx + 8], xmm3
  !movsd [edx + 16], xmm4
  !divsd xmm0, xmm2
  !movsd [edx + 24], xmm0
  !mov ecx, [p.v_ArrLen]
  !mov edx, [p.v_ArrAddr]
  !movsd xmm3, xmm0         ; xmm3 = Avg
  !lea eax, [ecx - 1]
  !cvtsi2sd xmm4, eax       ; xmm4 = Array length - 1
  !pxor xmm0, xmm0
  !dArrayInfoLoop2:
  !movsd xmm1, [edx]
  !subsd xmm1, xmm3
  !mulsd xmm1, xmm1
  !addsd xmm0, xmm1
  !add edx, 8
  !dec ecx
  !jnz dArrayInfoLoop2
  !mov edx, [p.p_ArrayInfo]
  !movsd xmm1, xmm0
  !divsd xmm0, xmm2
  !movsd [edx + 32], xmm0
  !sqrtsd xmm0, xmm0
  !movsd [edx + 40], xmm0
  !divsd xmm1, xmm4
  !movsd [edx + 48], xmm1
  !sqrtsd xmm1, xmm1
  !movsd [edx + 56], xmm1
EndProcedure

Dim A.d(4)
A(0) = 600
A(1) = 470
A(2) = 170
A(3) = 430
A(4) = 300

DoubleArrayInfo(A(), ArrayInfo.dArrayInfo)

Info.s = "Sum = " + StrD(ArrayInfo\Sum) + #LF$
Info   + "Min = " + StrD(ArrayInfo\Min) + #LF$
Info   + "Max = " + StrD(ArrayInfo\Max) + #LF$
Info   + "Avg = " + StrD(ArrayInfo\Avg) + #LF$
Info   + "Population variance = " + StrD(ArrayInfo\PopulationVariance) + #LF$
Info   + "Population standard deviation = " + StrD(ArrayInfo\PopulationStdDev) + #LF$
Info   + "Sample variance = " + StrD(ArrayInfo\SampleVariance) + #LF$
Info   + "Sample standard deviation = " + StrD(ArrayInfo\SampleStdDev)

MessageRequester("Info", Info)

Re: Statistics functions?

Posted: Thu Jul 12, 2012 2:39 pm
by blueb
Davy wrote:Is there any code available in the public domain for PB statistics functions? I haven't seen anything on the code archive site. e.g. mean, mode, variance, standard deviation, binomial distribution etc.

Thanks.
Checkout the "Rosetta Code" website. You'll find something at: http://rosettacode.org/wiki/Statistics/Basic#PureBasic

(Big thanks to Demivec and others for all their hard work) 8)

blueb