Mosaic plots let you view the relationship between two or more categorical variables. That's the official, universal definition. I have a lot of confusion between mosaicplot and spineplot. The docs stress that while mosaic plots let you see more than two variables, spine plots are limited to just two.
If you research mosaic plots on the web, you will invariably find the Titanic survival analysis at the top of the heap. Mosaic plots are all about proportional boxes representing related categories, so when you read a mosaic plot, you find yourself drilling down into it, reading it X-Y-X-Y to discover what you want. You'll see what I mean in the R example below.
R Example
Let's just get on with it. This example creates the default mosaic plot for the HairEyeColor dataset.
> mosaicplot(HairEyeColor)
It's all about Hair color (x axis) and Eye color (y axis). Eye color is subcategorized into four colors. Hair is also subcategorized into four colors, and also further subdivided by gender. The width of a segment (in any direction) indicates its relative proportion of the total (in the same direction).
As you look at the plot, you can see that people with blond hair are more likely to have blue eyes, and that a difference in gender does not effect a tendency towards eye color when the hair color is the same. So in this example, there are three variables being compared: eye color, hair color, and gender.
Two challenges I see with a mosaic plot is that your audience need some experience interpreting one before you present it. You'll be spending a lot of time explaining it otherwise. The other issue is that two people can come to different and equally valid conclusions, depending on the mosaic distribution (this hair/eye example doesn't really show that -- you just need to remember to examine all angles of the graph for contradictions if you plan on using it to support something it seems to show).
No comments:
Post a Comment
Please help to combat malicious use of the Internet.