R-Squared

Just starting out? Need help? Post your questions and find answers here.
stef
New User
New User
Posts: 7
Joined: Fri Apr 15, 2005 11:03 am
Location: Derbyshire, UK

R-Squared

Post by stef »

Having searched the forum and unable to find help calculating R-squared, I'm asking older hands on here if there any ready made additional Purebasic statistical functions/routines anywhere?

Alternatively, for the maths men;
If I have two arrays, projected.f(100) and actual.f(100) can anyone suggest the coding needed to calculate resultant R-sq from these two lists?

Thanks in anticipation.
Stef
User avatar
Kaeru Gaman
Addict
Addict
Posts: 4826
Joined: Sun Mar 19, 2006 1:57 pm
Location: Germany

Post by Kaeru Gaman »

since I didn't know what you ment, I searched Wiki.
for those interested: http://en.wikipedia.org/wiki/R-squared
oh... and have a nice day.
stef
New User
New User
Posts: 7
Joined: Fri Apr 15, 2005 11:03 am
Location: Derbyshire, UK

Post by stef »

Kaeru Gaman wrote:since I didn't know what you ment,
Maybe that implies I know what I mean :oops:

As someone whose basic knowledge of maths is limited to adds, take-aways and share-by's this is a major part of the reason I asked.

. . . but someone has asked me to display the R-Sq figure for two lists of numbers, so thought I'd ask.

Stef
akj
Enthusiast
Enthusiast
Posts: 668
Joined: Mon Jun 09, 2003 10:08 pm
Location: Nottingham

Post by akj »

@stef
I don't know whether this will help you, but I have modified a curve fitting routine that I use to include the Coefficient of Determination R2:

Code: Select all

; Chebyshev  AKJ  04-Jun-08
; Curve fitting using Chebyshev (Tchebycheff) orthogonal polynomials
; Using the method shown in Biometrika for Statisticans pages 91..95
;
; Only the dependent (y) data values are given/used.
; A few missing (zero) data values are permitted and will be estimated.
; The independent (x) values are implicit, but must be equally spaced.
; The first and last points can be clamped i.e. fitted exactly.

#program$ = "Chebyshev"
#version$ = "1.4"
EnableExplicit

Declare.d Orthog(n, i, t)
Declare.d SumOfSquares(n, i)

Define t,i ; Independent variables
Define y.d, dy.d ; Calculated y
NewList miss.l() ; Values of t for which y=0 (i.e. missing y value)
Define retry = 0 ; Iteration count if values are clamped or missing
Define retries=5 ; Maximum iterations (this should be enough)

; Get the degree 0..6 of the polynomial to be fitted
; Determine whether the outermost points must be fitted exactly (clamped)
Define degree, clamped.b
Restore rawdata
Read degree ; 0..6 (negative if ends are clamped)
If degree<0
  degree = -degree
  retry = retries ; Iteration count
  clamped = #True ; 1
Else
  clamped = #False ; 0
EndIf
If degree>6: degree=6: EndIf

; Get the data point y values and recognise any missing (zero) ones
Define n ; Number of equally spaced data points
Read n ; Must be >=3 for 'missing values' routine to work !!!
If degree>=n: degree = n-1: EndIf
Dim y.d(n) ; Data points
For t=1 To n
  Read y(t)
  If y(t)=0.0
    If (Not clamped) Or t<>0 Or t<>n ; Clamped points are not treated As missing
      AddElement(miss()): miss()=t ; Remember missing point
    EndIf
  EndIf
Next t

; Estimate any missing (zero) values by linear interpolation
; Cannot currently handle consecutive missing values !!!
If CountList(miss())
  ForEach miss()
    t=miss()
    Select t
      Case 1: y(1) = y(2)*2.0-y(3)
      Case n: y(n) = y(n-1)*2.0-y(n-2)
      Default: y(t) = (y(t-1)+y(t+1))*0.5
    EndSelect
  Next miss()
  retry = retries ; Iteration count
EndIf

; Precalculate sums of squares
Dim sumsquares.d(degree)
For i=0 To degree
  sumsquares(i) = SumOfSquares(n,i)
Next i

; Estimate coefficients A(i) - before considering clamped/missing values
Dim A.d(degree)
For i=0 To degree
  For t=1 To n
    A(i) + y(t)*Orthog(n, i, t)
  Next t
  A(i)/sumsquares(i)
Next i

; Iterate any clamped or missing values
Define y1old.d, ynold.d ; Original raw y values for clamped points
Define y1.d, yn.d ; Adjusted raw values
If clamped: y1old=y(1): ynold=y(n): EndIf
While retry>0
  If clamped
    ; Adjust the raw y so the fitted y approaches the original raw y
    y1 = 0.0: yn = 0.0
    For i=0 To degree
      y1 + A(i)*Orthog(n, i, 1): yn + A(i)*Orthog(n, i, n)
    Next i
    y(1) = y1: y(n) = yn
    ; Calculate revised values for the coefficients
    For i=0 To degree
      A(i) + (y1old-y1)*Orthog(n, i, 1)/sumsquares(i)
      A(i) + (ynold-yn)*Orthog(n, i, n)/sumsquares(i)
    Next i
  EndIf ; clamped
  ForEach miss()
    t = miss() ; There is a missing y for this t
    ; Estimate the missing y and save as the current value
    y = 0.0
    For i=0 To degree
      y + A(i)*Orthog(n, i, t)
    Next i
    Swap y, y(t)
    ; Calculate revised values for the coefficients
    For i=0 To degree
      A(i) + (y(t)-y)*Orthog(n, i, t)/sumsquares(i)
    Next i
  Next miss()
  ; Loop if another iteration should be done
  retry - 1
Wend ; retry
If clamped: y(1)=y1old: y(n)=ynold: EndIf ; Restore original values

; Get the total contribution of the data points
Define totalcontr.d=0.0
For t=1 To n
  totalcontr + y(t)*y(t)
Next t

; Output the coefficients A(i) and their contributions
Debug ""
Debug "Coefficients A(i) and their sum of squares contributions"
Define contr.d, residcontr.d
residcontr = totalcontr
For i=0 To degree
  contr = A(i)*A(i)*sumsquares(i)
  residcontr - contr
  Debug "A("+Str(i)+") = "+StrD(A(i),8)+"   --> "+StrD(contr,2)
Next i
Debug "Resid contribution   --> "+StrD(residcontr,6)
Debug "Total contribution   --> "+StrD(totalcontr,2)

; Output raw and fitted y values
; and calculate the coefficient of determination R2
Debug ""
Debug "Raw and fitted values using polynomial of degree "+Str(degree)
If clamped: Debug "Clamped at both ends": EndIf
Define missing
Define f.d, SStot.d=0.0, SSerr.d=0.0, R2.d
If FirstElement(miss()): missing=miss(): Else: missing=0: EndIf
For t=1 To n
  f = 0.0 ; Fitted y
  For i=0 To degree
    f + A(i)*Orthog(n, i, t)
  Next i
  y = y(t) ; Raw y
  If t=missing
    If NextElement(miss()): missing=miss(): Else: missing=0: EndIf
    Debug Str(t)+"   <null>   "+StrD(f,2)
  Else
    Debug Str(t)+"   "+StrD(y,2)+"   "+StrD(f,2)
  EndIf
  SStot + y*y: SSerr + (y-f)*(y-f) ; Accumulate sums of squares
Next t
SStot - n*A(0)*A(0) ; N.B. A(0) = y average
R2 = 1.0-SSerr/SStot
Debug ""
Debug "SStot  = "+StrD(SStot,6)
Debug "SSerr  = "+StrD(SSerr,6) ; = residcontr (unless clamped)
Debug "R2 = "+StrD(R2,6)

; Windup
Debug ""
End

Procedure.d Orthog(n, i, t)
; Ordinates of points of Chebyshev orthogonal polynomials
; Implementation of table 47 Orthogonal Polynomials
;   from the book:  Biometrika Tables for Statisticians Volume 1
; In the table  n=3(1)52  Number of data points
; In the table: i=0(1)6   Degree of Chebyshev polynomial
;               t=1(1)n   Data point identifier
; Warning:  No check is made for 32-bit integer overflow !!!
Protected v.l, v2.l, n2.l
v = t+t-(n+1) ; 2x (where x is an integer iff t is odd)
v2=v*v: n2=n*n
Select i
  Case 0: ProcedureReturn 1.0
  Case 1: ProcedureReturn v/2.0
  Case 2: ProcedureReturn (3*v2-n2+1)/12.0
  Case 3: ProcedureReturn ((5*v2-3*n2+7)*v)/40.0
  Case 4: ProcedureReturn ((35*v2-30*n2+130)*v2+3*(n2-1)*(n2-9))/560.0
  Case 5: ProcedureReturn (((63*v2-70*n2+490)*v2+(15*n2-230)*n2+407)*v)/2016.0
  Case 6: ProcedureReturn (((231*v2-315*n2+3255)*v2+21*((5*n2-110)*n2+329))*v2-5*(n2-1)*(n2-9)*(n2-25))/14784.0
EndSelect
ProcedureReturn 0.0
EndProcedure

Procedure.d SumOfSquares(n, i)
; Sums of squares of ordinate values for points on a Chebyshev polynomial of degree i
Protected t.l, thi.d, sumsquare.d=0.0
For t = 1 To n
  thi = Orthog(n, i, t)
  sumsquare + thi*thi
Next t
ProcedureReturn sumsquare
EndProcedure

DataSection
rawData:
Data.l -6 ; Degree of polynomial to be fitted 0..6 (-ve if ends are clamped)
Data.l 24 ; Number of equally-spaced points in the data set
; Average atmospheric temperature variation during December 1940
;   in Armoedsvalkte, Bechuanaland - by Van der Reyden (1943)
Data.d 65.25, 69.37, 74.44, 79.00, 82.72, 85.97, 88.11, 89.67, 90.36, 90.35, 88.92, 87.81
Data.d 84.95, 80.41, 76.15, 74.07, 73.13, 71.90, 70.57, 69.08, 68.12, 67.09, 66.71, 65.88
; If the 6th and 18th values were unknown ...
; Data.d 65.25, 69.37, 74.44, 79.00, 82.72, 0, 88.11, 89.67, 90.36, 90.35, 88.92, 87.81
; Data.d 84.95, 80.41, 76.15, 74.07, 73.13, 0, 70.57, 69.08, 68.12, 67.09, 66.71, 65.88
EndDataSection
Anthony Jordan
Post Reply