Having searched the forum and unable to find help calculating R-squared, I'm asking older hands on here if there any ready made additional Purebasic statistical functions/routines anywhere?
Alternatively, for the maths men;
If I have two arrays, projected.f(100) and actual.f(100) can anyone suggest the coding needed to calculate resultant R-sq from these two lists?
Thanks in anticipation.
Stef
R-Squared
- Kaeru Gaman
- Addict
- Posts: 4826
- Joined: Sun Mar 19, 2006 1:57 pm
- Location: Germany
since I didn't know what you ment, I searched Wiki.
for those interested: http://en.wikipedia.org/wiki/R-squared
for those interested: http://en.wikipedia.org/wiki/R-squared
oh... and have a nice day.
Maybe that implies I know what I meanKaeru Gaman wrote:since I didn't know what you ment,

As someone whose basic knowledge of maths is limited to adds, take-aways and share-by's this is a major part of the reason I asked.
. . . but someone has asked me to display the R-Sq figure for two lists of numbers, so thought I'd ask.
Stef
@stef
I don't know whether this will help you, but I have modified a curve fitting routine that I use to include the Coefficient of Determination R2:
I don't know whether this will help you, but I have modified a curve fitting routine that I use to include the Coefficient of Determination R2:
Code: Select all
; Chebyshev AKJ 04-Jun-08
; Curve fitting using Chebyshev (Tchebycheff) orthogonal polynomials
; Using the method shown in Biometrika for Statisticans pages 91..95
;
; Only the dependent (y) data values are given/used.
; A few missing (zero) data values are permitted and will be estimated.
; The independent (x) values are implicit, but must be equally spaced.
; The first and last points can be clamped i.e. fitted exactly.
#program$ = "Chebyshev"
#version$ = "1.4"
EnableExplicit
Declare.d Orthog(n, i, t)
Declare.d SumOfSquares(n, i)
Define t,i ; Independent variables
Define y.d, dy.d ; Calculated y
NewList miss.l() ; Values of t for which y=0 (i.e. missing y value)
Define retry = 0 ; Iteration count if values are clamped or missing
Define retries=5 ; Maximum iterations (this should be enough)
; Get the degree 0..6 of the polynomial to be fitted
; Determine whether the outermost points must be fitted exactly (clamped)
Define degree, clamped.b
Restore rawdata
Read degree ; 0..6 (negative if ends are clamped)
If degree<0
degree = -degree
retry = retries ; Iteration count
clamped = #True ; 1
Else
clamped = #False ; 0
EndIf
If degree>6: degree=6: EndIf
; Get the data point y values and recognise any missing (zero) ones
Define n ; Number of equally spaced data points
Read n ; Must be >=3 for 'missing values' routine to work !!!
If degree>=n: degree = n-1: EndIf
Dim y.d(n) ; Data points
For t=1 To n
Read y(t)
If y(t)=0.0
If (Not clamped) Or t<>0 Or t<>n ; Clamped points are not treated As missing
AddElement(miss()): miss()=t ; Remember missing point
EndIf
EndIf
Next t
; Estimate any missing (zero) values by linear interpolation
; Cannot currently handle consecutive missing values !!!
If CountList(miss())
ForEach miss()
t=miss()
Select t
Case 1: y(1) = y(2)*2.0-y(3)
Case n: y(n) = y(n-1)*2.0-y(n-2)
Default: y(t) = (y(t-1)+y(t+1))*0.5
EndSelect
Next miss()
retry = retries ; Iteration count
EndIf
; Precalculate sums of squares
Dim sumsquares.d(degree)
For i=0 To degree
sumsquares(i) = SumOfSquares(n,i)
Next i
; Estimate coefficients A(i) - before considering clamped/missing values
Dim A.d(degree)
For i=0 To degree
For t=1 To n
A(i) + y(t)*Orthog(n, i, t)
Next t
A(i)/sumsquares(i)
Next i
; Iterate any clamped or missing values
Define y1old.d, ynold.d ; Original raw y values for clamped points
Define y1.d, yn.d ; Adjusted raw values
If clamped: y1old=y(1): ynold=y(n): EndIf
While retry>0
If clamped
; Adjust the raw y so the fitted y approaches the original raw y
y1 = 0.0: yn = 0.0
For i=0 To degree
y1 + A(i)*Orthog(n, i, 1): yn + A(i)*Orthog(n, i, n)
Next i
y(1) = y1: y(n) = yn
; Calculate revised values for the coefficients
For i=0 To degree
A(i) + (y1old-y1)*Orthog(n, i, 1)/sumsquares(i)
A(i) + (ynold-yn)*Orthog(n, i, n)/sumsquares(i)
Next i
EndIf ; clamped
ForEach miss()
t = miss() ; There is a missing y for this t
; Estimate the missing y and save as the current value
y = 0.0
For i=0 To degree
y + A(i)*Orthog(n, i, t)
Next i
Swap y, y(t)
; Calculate revised values for the coefficients
For i=0 To degree
A(i) + (y(t)-y)*Orthog(n, i, t)/sumsquares(i)
Next i
Next miss()
; Loop if another iteration should be done
retry - 1
Wend ; retry
If clamped: y(1)=y1old: y(n)=ynold: EndIf ; Restore original values
; Get the total contribution of the data points
Define totalcontr.d=0.0
For t=1 To n
totalcontr + y(t)*y(t)
Next t
; Output the coefficients A(i) and their contributions
Debug ""
Debug "Coefficients A(i) and their sum of squares contributions"
Define contr.d, residcontr.d
residcontr = totalcontr
For i=0 To degree
contr = A(i)*A(i)*sumsquares(i)
residcontr - contr
Debug "A("+Str(i)+") = "+StrD(A(i),8)+" --> "+StrD(contr,2)
Next i
Debug "Resid contribution --> "+StrD(residcontr,6)
Debug "Total contribution --> "+StrD(totalcontr,2)
; Output raw and fitted y values
; and calculate the coefficient of determination R2
Debug ""
Debug "Raw and fitted values using polynomial of degree "+Str(degree)
If clamped: Debug "Clamped at both ends": EndIf
Define missing
Define f.d, SStot.d=0.0, SSerr.d=0.0, R2.d
If FirstElement(miss()): missing=miss(): Else: missing=0: EndIf
For t=1 To n
f = 0.0 ; Fitted y
For i=0 To degree
f + A(i)*Orthog(n, i, t)
Next i
y = y(t) ; Raw y
If t=missing
If NextElement(miss()): missing=miss(): Else: missing=0: EndIf
Debug Str(t)+" <null> "+StrD(f,2)
Else
Debug Str(t)+" "+StrD(y,2)+" "+StrD(f,2)
EndIf
SStot + y*y: SSerr + (y-f)*(y-f) ; Accumulate sums of squares
Next t
SStot - n*A(0)*A(0) ; N.B. A(0) = y average
R2 = 1.0-SSerr/SStot
Debug ""
Debug "SStot = "+StrD(SStot,6)
Debug "SSerr = "+StrD(SSerr,6) ; = residcontr (unless clamped)
Debug "R2 = "+StrD(R2,6)
; Windup
Debug ""
End
Procedure.d Orthog(n, i, t)
; Ordinates of points of Chebyshev orthogonal polynomials
; Implementation of table 47 Orthogonal Polynomials
; from the book: Biometrika Tables for Statisticians Volume 1
; In the table n=3(1)52 Number of data points
; In the table: i=0(1)6 Degree of Chebyshev polynomial
; t=1(1)n Data point identifier
; Warning: No check is made for 32-bit integer overflow !!!
Protected v.l, v2.l, n2.l
v = t+t-(n+1) ; 2x (where x is an integer iff t is odd)
v2=v*v: n2=n*n
Select i
Case 0: ProcedureReturn 1.0
Case 1: ProcedureReturn v/2.0
Case 2: ProcedureReturn (3*v2-n2+1)/12.0
Case 3: ProcedureReturn ((5*v2-3*n2+7)*v)/40.0
Case 4: ProcedureReturn ((35*v2-30*n2+130)*v2+3*(n2-1)*(n2-9))/560.0
Case 5: ProcedureReturn (((63*v2-70*n2+490)*v2+(15*n2-230)*n2+407)*v)/2016.0
Case 6: ProcedureReturn (((231*v2-315*n2+3255)*v2+21*((5*n2-110)*n2+329))*v2-5*(n2-1)*(n2-9)*(n2-25))/14784.0
EndSelect
ProcedureReturn 0.0
EndProcedure
Procedure.d SumOfSquares(n, i)
; Sums of squares of ordinate values for points on a Chebyshev polynomial of degree i
Protected t.l, thi.d, sumsquare.d=0.0
For t = 1 To n
thi = Orthog(n, i, t)
sumsquare + thi*thi
Next t
ProcedureReturn sumsquare
EndProcedure
DataSection
rawData:
Data.l -6 ; Degree of polynomial to be fitted 0..6 (-ve if ends are clamped)
Data.l 24 ; Number of equally-spaced points in the data set
; Average atmospheric temperature variation during December 1940
; in Armoedsvalkte, Bechuanaland - by Van der Reyden (1943)
Data.d 65.25, 69.37, 74.44, 79.00, 82.72, 85.97, 88.11, 89.67, 90.36, 90.35, 88.92, 87.81
Data.d 84.95, 80.41, 76.15, 74.07, 73.13, 71.90, 70.57, 69.08, 68.12, 67.09, 66.71, 65.88
; If the 6th and 18th values were unknown ...
; Data.d 65.25, 69.37, 74.44, 79.00, 82.72, 0, 88.11, 89.67, 90.36, 90.35, 88.92, 87.81
; Data.d 84.95, 80.41, 76.15, 74.07, 73.13, 0, 70.57, 69.08, 68.12, 67.09, 66.71, 65.88
EndDataSection
Anthony Jordan