They say that a spine plot is a special case of the mosaic, and a generalization of a stacked bar graph. Each vertical bar (I'll call it a 1st order category) is segmented along its length according to the relative proportions of 2nd order categories. In addition, each bar has a width which represents the proportion of its category among the several 1st order categories being examined. These are all similar to the mosaic. Unlike the mosaics I've seen, spine plots also feature a numeric scale on the y axis.
Even though there are some scholarly papers out there that show how to use a spine plot for more than just two variables, I'm going to go out here on a limb and say use a spine for just two.
When to use a spine plot instead of a bar graph? I think it depends on how you want to make your case. A bar graph is usually used to compare relative quantities or proportions among similar or related categories. I often have a dozen categories along the x axis, with each bar segmented along the y axis with random segments. That is, each bar doesn't need to have the same number of segments. Spine plots will involve few 'bars,' and each will have the same number of divisions. (At least, this is what I'm going with now until I find out different later.)
R Example
This example uses the UCBAdmission dataset to show the distribution of the genders as applicants for admission to UC Berkeley.
spineplot(xtabs(Freq ~ Admit + Gender, data = UCBAdmissions))
- more people were rejected than accepted -- the Rejected bar is wider than the Admitted. We don't know how much, but it eyeballs to about 2:3, or 66% rejected.
- a higher percentage of male applicants were rejected than were female applicants (about 55% to 45%).
- of those accepted, a higher percentage were males than female.
No comments:
Post a Comment
Please help to combat malicious use of the Internet.