Monday, September 14, 2020

stem

A stem and leaf plot is similar to a histogram. It displays numeric values in an ordered distribution, so you can gauge its shape. It lies rotated 90° from a typical histogram. A vertical line separates the tens (or hundreds) place from the units. The tens (or hundreds) ascend from top to bottom, and the units ascend from left to right.

Unlike a histogram, a stem and leaf plot preserves the value of each data point within the graph.

Note: there is also a stemplot, which involves plotting a matrix of y values along an x axis. I don't know much about stemplots at this time.

R has the built in stem command to create stem and leaf. It only requires a vector of values. It has a couple of options, the most useful one being scale, which controls the plot length. The aplpack package also provides stem.leaf, which has a pile of options that can help you trim outliers, create back-to-back stemplots, and more. Worth a look if you are going to be using stem and leaf a lot.

I remember going through many steps to create stem and leafs in my first statistics class. R makes it trivial.

R Example

I'm basing this on the topic in Kern's IPSUR book. It uses the rivers dataset.

> stem(rivers)

The decimal point is 2 digit(s) to the right of the |

 0 | 4
 2 | 011223334555566667778888899900001111223333344455555666688888999
 4 | 111222333445566779001233344567
 6 | 000112233578012234468
 8 | 045790018
10 | 04507
12 | 1471
14 | 56
16 | 7
18 | 9
20 |
22 | 25
24 | 3
26 |
28 |
30 |
32 |
34 |
36 | 1




This plot is right skewed.

Read a data point by taking one value from the left side, and one from the right side of the vertical line. There is one value below 20, "04". The number 22 happened twice, the number 222 once, and the number 361 once. And so on.

No comments:

Post a Comment

Please help to combat malicious use of the Internet.