Thursday, October 29, 2020

mosaicplot

Mosaic plots let you view the relationship between two or more categorical variables. That's the official, universal definition. I have a lot of confusion between mosaicplot and spineplot. The docs stress that while mosaic plots let you see more than two variables, spine plots are limited to just two. 

If you research mosaic plots on the web, you will invariably find the Titanic survival analysis at the top of the heap. Mosaic plots are all about proportional boxes representing related categories, so when you read a mosaic plot, you find yourself drilling down into it, reading it X-Y-X-Y to discover what you want. You'll see what I mean in the R example below.

R Example

Let's just get on with it. This example creates the default mosaic plot for the HairEyeColor dataset. 

> mosaicplot(HairEyeColor)

It's all about Hair color (x axis) and Eye color (y axis). Eye color is subcategorized into four colors. Hair is also subcategorized into four colors, and also further subdivided by gender. The width of a segment (in any direction) indicates its relative proportion of the total (in the same direction).

As you look at the plot, you can see that people with blond hair are more likely to have blue eyes, and that a difference in gender does not effect a tendency towards eye color when the hair color is the same. So in this example, there are three variables being compared: eye color, hair color, and gender.

Two challenges I see with a mosaic plot is that your audience need some experience interpreting one before you present it. You'll be spending a lot of time explaining it otherwise. The other issue is that two people can come to different and equally valid conclusions, depending on the mosaic distribution (this hair/eye example doesn't really show that -- you just need to remember to examine all angles of the graph for contradictions if you plan on using it to support something it seems to show).

Friday, October 16, 2020

bwplot

Available with the lattice package, a bivariate trellis plot (bwplot, or box and whisker plot) displays multiple box plots based on a category in the dataset. Lattice describes bwplot on the same page as several other plots, so you have to go down the page to find info on it. It has over 20 options, but you only need the dataset name, the parameter to count, and the grouping parameter.

Plot comparisons are low in detail, but allow you to compare many items at a glance. In the case of bwplot, you are comparing the IQR of a dataset grouped by some category.

R Example

This example depicts the distribution of chicken weights, grouped by the type of feed. A comparison lets you judge with feed might be optimal to provide your hens.

> library(lattice)
> bwplot(~weight | feed, data = chickwts)



Looking at this, we might want to use sunflower, because it has the most predictable value. Or we might choose casein, because you may want to produce chickens of greater variance in size and weight.

Thursday, October 8, 2020

boxplot

Boxplots are a fast way to judge spread, shape, and center. Also called box and whisker plots, they depict the five number summary in a structured shape.

Use a boxplot when you want a visual summary of the data set, and you don't have (or don't want to use) many data points. I think the main negative thing about boxplots is that you may need to educate people what they are seeing, as opposed to a distribution or bar chart, which are ubiquitous in public culture.

The boxplot function has a pile of options, but you only need the data and count options to show something.

> boxplot(count ~ spray, data = InsectSprays, horizontal = TRUE)

Values are represented by the lines around each box.

min - leftmost whisker. Insecticides A and B have min values about 6.

IQR - the box shape. Insecticide F has the widest IQR, and insecticide D has the narrowest.

median - the heavy line within the box

max - the rightmost whisker. Insecticide D has a max equal to the upper value of its IQR

outliers - circles outside a box (C and D in this plot both have an outlier)