ggplot(childcount1920USOR_long,
aes(x = state, y = percent, fill = race)) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_fill_discrete(labels = c("American Indian")) +
labs(title = "Percentage of Child Count by Race in US & Oregon",
x = "State",
y = "Percentage",
fill = "Race") +
theme_minimal()
This is the second part of my data visualization journey. For this, I used the same dataset as in the previous post, which showed racial representation in Early Intervention (EI) across the U.S. I compared that data to racial representation in Oregon, using information from the same OSEP website. I combined data from 2013 to 2022 to get the most statistical power for later analysis.
First, I created a bar chart using ggplot()
with the geom_bar()
function.
Here is an example of how labeling can be completely inappropriate. However, I appreciate using the viridis color palette with scale_fill_viridis_d()
, as it helps make colors more accessible for people with color perception differences.
ggplot(childcount1920USOR_long, aes(x = state, y = percent, fill = race)) +
geom_bar(stat = "identity") +
scale_fill_viridis_d() +
scale_y_continuous(labels = percent_format(scale = 1)) +
labs(title = "Percentage of Child Count by Race in US & Oregon",
x = "State",
y = "Percentage",
fill = "Race") +
geom_text(aes(label = race),
position = position_stack(vjust = 0.5),
color = "black",
size = 4) +
theme_minimal()
I really enjoy how easy it is to play with different types of plots that ggplot()
offers.
ggplot(childcount1920USOR_long, aes(x = state, y = percent, fill = race)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_viridis_d() +
scale_y_continuous(labels = percent_format(scale = 1)) +
labs(title = "Percentage of Child Count by Race in US & Oregon",
x = "State",
y = "Percentage",
fill = "Race") +
theme_minimal()
So, I kept experimenting.
ggplot(childcount1920USOR_long,
aes(x = race,
y = percent,
fill = state)) +
geom_bar(stat = "identity",
position = position_dodge(width = 0.7)) +
scale_fill_viridis_d() +
scale_y_continuous(
labels = percent_format(scale = 1),
expand = expansion(mult = c(0, 0.05))) +
labs(title = "Percentage of Child Count by Race in US & Oregon",
x = "Race",
y = "Percentage",
fill = "State") +
geom_text(aes(label = scales::percent(percent / 100, accuracy = 0.1)),
position = position_dodge(width = 0.7),
vjust = -0.5,
color = "black",
size = 4) +
theme_minimal(base_size = 14) +
theme(legend.position = "bottom")
After making adjustments to show the data more clearly and accurately, here is the final version that shows the data best! Flipping the X and Y axes helps readers understand the categories more easily, and the data labels make it clearer to see the exact differences between the national data and Oregon’s data. I also assigned specific colors from the viridis
color pallette for the national and Oregon’s data to maintain consistency in future visualizations.
<- c(
viridis_colors "Oregon" = "#7ad151",
"US and Outlying Areas" = "#414487")
$race <- factor(childcount1920USOR_long$race, levels = sort(unique(childcount1920USOR_long$race)))
childcount1920USOR_long$race <- forcats::fct_rev(childcount1920USOR_long$race) childcount1920USOR_long
ggplot(childcount1920USOR_long,
aes(x = percent,
y = race,
fill = state)) +
geom_bar(stat = "identity",
position = position_dodge(width = 0.7)) +
scale_fill_manual(
values = viridis_colors,
breaks = c("US and Outlying Areas", "Oregon"),
labels = c("US and Outlying Areas", "Oregon")
+
) scale_x_continuous(
labels = percent_format(scale = 1),
expand = expansion(mult = c(0, 0.05))) +
labs(
title = "Representations in EI",
x = "Percentage",
y = "Race",
fill = NULL) +
geom_text(aes(
label = scales::percent(percent / 100, accuracy = 0.1)),
position = position_dodge(width = 0.7),
hjust = -0.2,
color = "black",
size = 4) +
theme_minimal(base_size = 14) +
theme(
legend.position = "bottom"
)