Thursday, November 12, 2020

spineplot

They say that a spine plot is a special case of the mosaic, and a generalization of a stacked bar graph. Each vertical bar (I'll call it a 1st order category) is segmented along its length according to the relative proportions of 2nd order categories. In addition, each bar has a width which represents the proportion of its category among the several 1st order categories being examined. These are all similar to the mosaic. Unlike the mosaics I've seen, spine plots also feature a numeric scale on the y axis.

Even though there are some scholarly papers out there that show how to use a spine plot for more than just two variables, I'm going to go out here on a limb and say use a spine for just two. 

When to use a spine plot instead of a bar graph? I think it depends on how you want to make your case. A bar graph is usually used to compare relative quantities or proportions among similar or related categories.  I often have a dozen categories along the x axis, with each bar segmented along the y axis with random segments. That is, each bar doesn't need to have the same number of segments. Spine plots will involve few 'bars,' and each will have the same number of divisions. (At least, this is what I'm going with now until I find out different later.)

R Example

This example uses the UCBAdmission dataset to show the distribution of the genders as applicants for admission to UC Berkeley.

spineplot(xtabs(Freq ~ Admit + Gender, data = UCBAdmissions))


[depiction o f UCBAdmission dataset]

The plot shows:
  • more people were rejected than accepted -- the Rejected bar is wider than the Admitted. We don't know how much, but it eyeballs to about 2:3, or 66% rejected.
  • a higher percentage of male applicants were rejected than were female applicants (about 55% to 45%).
  • of those accepted, a higher percentage were males than female.
Okay, that's nice. The chart makes it seem that men are overly represented. But notice that we have no information on individual totals. 

If we take the same data, and plot it with barplot, we can influence people in a completely different way:

barplot(xtabs(Freq ~ Admit + Gender, data = UCBAdmissions))


Okay, that's quite different. Though many more males are admitted, many more males applied. Further a much greater proportion of the females who applied were admitted. (Please don't try to read into this data why more males applied to Berkeley.)

If you wanted to show how skewed the admittance membership is, you would use a spine plot. If you wanted to show how skewed the admittance preferences are, you would use a stacked bar chart.

Welcome to descriptive statistics.



No comments:

Post a Comment

Please help to combat malicious use of the Internet.