Friday, September 11, 2020

pareto

Pronounced pa RAY toe. A pareto diagram is similar to a bar chart, in that it depicts categorical data. Unlike a bar chart, it orders the bars in decreasing size. This scheme not only makes it easy for you to see which categories are most important, it also lets you compare their relative sizes quickly.

Often pareto diagrams include an ogive line, which tracks the cumulative number of values across the bars. With the addition of an ogive, you can quickly estimate when a given percentage of the total data has been allocated to the major categories.

You need to include the qcc library to get pareto chart functionalty.

R Example

This example is similar to the barplot example. It shows the cumulative number of states within several regions. Along with the chart, the table of values is emitted by the R studio.

pareto.chart(table(state.division), ylab = "Frequency")

Pareto chart analysis for table(state.division)

               Frequency Cum.Freq. Percentage Cum.Percent.
South Atlantic     8         8        16          16
Mountain           8        16        16          32
West North Central 7        23        14          46
New England        6        29        12          58
East North Central 5        34        10          68
Pacific            5        39        10          78
East South Central 4        43         8          86
West South Central 4        47         8          94
Middle Atlantic    3        50         6         100

cumulative number of states within several regions
Cumulative number of states within several regions

Tuesday, September 8, 2020

dotchart

The dotchart command provides a dot plot, a simple representation of frequency. It is part of the graphics package.

As with dot plot, advantages include a graph you can quickly make by hand. Outliers are easy to identify, and you can gage clusters, gaps, and histogram-like shapes. Most notably, dot plots preserve the numeric value of each data point (something lost with a bar chart, for example). See Dot plot (statistics) in Wikipedia.

It's use is limited to data sets of relatively few data points. Too many points and the graph becomes unreadable.

dotchart has 20 parameters, but the only one you must have is x -- a vector or matrix (table).

dotcart is a Variation on Bar Chart

After poking around with this a bit, I discover that dotchart is specialized dot plot function. The R documentation lists this as Cleveland's Dot Plot. In Wikipedia, I learn that a Cleveland Dot Plot is "…an alternative to the bar chart, in which dots are used to depict the quantitative values (e.g. counts) associated with categorical variables." These are supposed to be easier to read and interpret that bar charts.

R Example

This first example plots the numeric mass of a number of common animals.

> library(MASS) 
> dotchart(airmiles)

passenger miles on US airlines between 1937 and 1960
Passenger miles on US airlines between 1937 and 1960

This example shows just one dot per category. The airmiles dataset lists passenger miles on US airlines between 1937 and 1960. In the structure, though, the date years are represented as Time-Series [1:24], so there's no hard-wired column of years.

I was initially confused with dotchart, because the examples didn't show me how to label the dots. (The plot function ekes the years out automatically.) I found that to label the points in dotchart requires creation of another set of points be paired with the airmiles data). There are also dotchart2 and dotchart3 which may handle time series years more elegantly. So I'm going with just another example:


dotchart(mtcars$mpg,labels=row.names(mtcars),cex=.7,
        main="Gas Milage for Car Models",
        xlab="Miles Per Gallon")



MPG for Common Cars
MPG for Common Cars
I think this version is easier to read than a barchart, for sure. The calibrate:textxy function allows you to do this.

Friday, September 4, 2020

barplot

Bar graphs or bar charts are created using barplot, and are frequency comparisons for categorical data (rather than numeric occurrences of a single variable as with histograms). Heights of bars are proportional to the frequency of occurrence of each category.

Use a bar graph when you want to compare the incidence of different but similar things. Survival of different mammals on a flooded island, on the number of paper subscriptions in several city neighborhoods.

Bar graphs can be single bars per category. They can also be "doubled-up" -- two or more bars in the same category turning on a split in the data. An example might be paper subscriptions in city neighborhoods, split by gender. Bar graphs may also be "stacked," with each bar divided into a number of differently colored segments representing random characteristics (that is, characteristics which may not appear in every bar in the chart). For example, paper subscriptions per neighborhood, with segments for morning, evening, and Sunday only subscriptions. In some neighborhoods there may be no Sunday only subscriptions, while in another neighborhood, everybody gets the morning edition.

R Example

This example plots the number of states which exist in the four main regions of the U.S.

barplot(table(state.region))

Number of States in Four U.S. Regions
Number of States in Four U.S. Regions

Sunday, August 23, 2020

plot (Index plot)

This is a two dimensional plot where the index (such as time) is the x variable, and the measured value is the y variable. Use an index plot when you want to plot ordered data against an increasing scale. Kern assumes that the increasing scale will be time.

You can use the generic plot function to create a index plot (the RcmdrMisc.indexplot is also available). The plot function offers 9 ways to display the data, from points to steps.

R Example

This example shows the changes in the depth of Lake Huron from 1875 to 1972, using a single line ("l") display.

plot(LakeHuron, type = "l")

graph of Lake Huron depths

Reference

Thursday, August 20, 2020

hist

Create a histogram. The hist function is in the base R package. With hist, the only param you need (out of about 20) is a data frame.

There are other histogram functions available from other packages (lattice.histogram, for example). Regardless of which you choose, the advantages and disadvantages are the same.

Use a histogram when you want to view the frequency of occurrence of something across a set of sequential bins. Another way of saying it is a histogram shows a distribution of a variable. Silt depth from shore to 3', from 3' to 6', 6' to 9', and so on. It is distinguished from a bar graph, which is used to compare one or more variables.

Histogram Weakness

You must take care determining the bin size for your histogram. A bin size that is less than optimal can provide a misleading depiction. With hist, experiment with the breaks parameter to adjust the graph for the best information display for your purposes.

R Example

This example shows precipitation for each of 70 U.S. + Puerto Rico cites.

hist(precip, breaks = 25)
histogram of precip in US cities

The breaks parameter directs histogram is to have 25 bins (note that if you count the bins, you don't get 25 -- I'm thinking that R adjusts it +/- from the value of breaks for a better result). Anyway, 4 cities had rainfall between 0-5 inches, and one city had rainfall over 65 inches.

You can have hist show frequency or relative probability. It just changes the left hand scale.

Monday, August 17, 2020

stripchart (dot plot)

Also known as strip chart and strip plot. Call this function in the form:

stripchart(quantity ~ category, method = "stack", data = "data.frame")

stripchart produces a one-dimensional scatter plot on discrete, continuous, and univariate data. Data points are plotted above a single numeric scale. You can create dot plots by hand. Use them when the data set is small, and you want to identify outliers and clusters of values.

While the only required parameter for stripchart is the data frame, optional parameters let you set horizonal or vertical orientation, group data in categories, and add "jitter", to help keep values from superimposing.

R Example

This example plots the values from the standand rivers dataset, the lengths of 141 major rivers in North America.

stripchart(rivers, method = "jitter", xlab = "length")
dot plot of rivers dataset

Thursday, July 23, 2020

On to Basic Graphs

In the previous posts, I've focused on the typical commands you use to report and analyze numeric and tabular data. For the next dozen or so posts I'm going to review graphing functions.

Because now we are getting to the meat of the deal. Visualizations. I've been on a mission to understand how to interpret graphs for many years -- it's why I stepped up to study stats in 2018. The reasons why people need to understand what graphs tell us are broadcast by anyone who knows anything about statistics. To our detriment, until we look into it more deeply we simply nod and ignore. 

I imagine that most of what people can be made to understand can be done using one or more of the graphs in this section.