Recently I’ve needed to plot compositional data by one or more groups. These are usually in the form of a categorical variable (ordered or not) and a binary variable to distinguish two groups; e.g., minority status or poverty (0/1). I was struggling to plot the categorical variable across the two groups so that the bars sum to 100% for each group. Let’s start with a simple example.

Data is from IPUMS. My data is (here) with setup (here). We have an exhaustive five category grouping of family structure: (1) two adults no working woman, (2) two adults with working woman, (3) single woman not working, (4) single woman working, and (5) single male.

Let’s say we want to examine poverty status across family structure. Poverty is measured using the US Census official poverty measure. We want to analyze the family structure compositions of the poor versus non-poor.

Let’s begin with a descriptive barchart of family structure using the catplot package from Nicholas J. Cox.

catplot fams, percent ///
ytitle("Percent") ///
title("Poverty and family structure") ///
subtitle("") ///
note("Source: IPUMS ACS") ///
ysize(3) blabel(bar, format(%9.1f))
graph export catplot1.png, replace

catplot1

Now let’s add the poverty status.

catplot povstat fams, percent asyvars ///
ytitle("Percent") ///
legend(label(1 "Non-poor") label(2 "Poor")) ///
title("Poverty and family structure") ///
subtitle("") ///
note("Source: IPUMS ACS") ///
ysize(3) blabel(bar, format(%9.1f))
graph export catplot1.png, replace

catplot2 This is not bad. However, the percent are cumulative for the entire sample. We are usually looking for compositions within group so that we can compare across groups.

catplot povstat fams, percent(povstat)  ///
ytitle("Percent") ///
legend(label(1 "Non-poor") label(2 "Poor")) ///
title("Poverty and family structure") ///
subtitle("") ///
note("Source: IPUMS ACS") ///
ysize(3) blabel(bar, format(%9.1f))
graph export catplot3.png, replace

catplot2 Again, we have improvement. But, visually seeing the 0 and 1 next to each other remains burdensome on the viewer. Can we get rid of those and add some color to facilitate comparison?
The option asyvars helps here.

catplot povstat fams, percent(povstat) asyvars ///
ytitle("Percent") ///
legend(label(1 "Non-poor") label(2 "Poor")) ///
title("Poverty and family structure") ///
subtitle("") ///
note("Source: IPUMS ACS") ///
ysize(3) blabel(bar, format(%9.1f))
graph export catplot4.png, replace

catplot2

 

Leave a reply