Sunday, May 17, 2020

var

Variance. A measure of spread. It's the expectation of the squared deviation of a random variable from its mean.

In the R docs, var is listed on the same page as cor (correlation) and cov (covariance).

sd is the square root of var. You need to use sum of squares to calculate variance, because if you use simple deviation, the values will cancel out to zero, because the deviations of the values vary plus or minus the mean, which is a calculation on the same values.

All three have the same six possible parameters. Along with the X and Y, you can opt to remove NA values. The 'use' and 'method' parameters allow you to choose computing methods. There's a 'v' method I don’t understand yet.

R Example

> var(rivers)
[1] 243908.4

I may need to look into the utility for variance. Up to this point it's been the standard deviation I make most use of.

Friday, May 15, 2020

scale (z-score)

The R docs present this as a generic function (Scaling and Centering of Matrix-like Objects). In Stat 3743, Kern uses it. It appears to calculate the z-score for an element based on the data.frame. I'm not sure yet how to get just one z-score for a particular value in a data.frame.

R Example

For example, the data.frame precip provides rainfall data from 7 to 67 inches:

> range(precip)
[1]  7 67

And scale(range(precip)) provides the min and max z-scores for all the elements in precip:

> range(scale(precip))
[1] -2.034466  2.342971

Wednesday, May 13, 2020

fivenum

A measure of spread and shape. Returns Tukey's five number summary for a dataset:


minimum

smallest value

lower hinge

~ 25% value

median

middle value

upper hinge

~75% value

maximum

greatest value


These are depicted pretty well with box plots. I'll link to that when I write it.

R Example

> fivenum(faithful$waiting)
[1] 43 58 76 82 96