Visualizaing children served in EI

This is the second part of my data visualization journey. For this, I used the same dataset as in the previous post, which showed racial representation in Early Intervention (EI) across the U.S. I compared that data to racial representation in Oregon, using information from the same OSEP website. I combined data from 2013 to 2022 to get the most statistical power for later analysis.

First, I created a bar chart using ggplot() with the geom_bar() function.

ggplot(childcount1920USOR_long, 
       aes(x = state, y = percent, fill = race)) +
  geom_bar(stat = "identity") + 
  scale_y_continuous(labels = scales::percent_format(scale = 1)) + 
  scale_fill_discrete(labels = c("American Indian")) +
  labs(title = "Percentage of Child Count by Race in US & Oregon",
       x = "State",
       y = "Percentage",
       fill = "Race") +
  theme_minimal()

Here is an example of how labeling can be completely inappropriate. However, I appreciate using the viridis color palette with scale_fill_viridis_d(), as it helps make colors more accessible for people with color perception differences.

ggplot(childcount1920USOR_long, aes(x = state, y = percent, fill = race)) +
  geom_bar(stat = "identity") + 
  scale_fill_viridis_d() +  
  scale_y_continuous(labels = percent_format(scale = 1)) +
  labs(title = "Percentage of Child Count by Race in US & Oregon",
       x = "State",
       y = "Percentage",
       fill = "Race") +
  geom_text(aes(label = race), 
            position = position_stack(vjust = 0.5), 
            color = "black", 
            size = 4) +
  theme_minimal()

I really enjoy how easy it is to play with different types of plots that ggplot() offers.

ggplot(childcount1920USOR_long, aes(x = state, y = percent, fill = race)) +
  geom_bar(stat = "identity", position = "dodge") +  
  scale_fill_viridis_d() +  
  scale_y_continuous(labels = percent_format(scale = 1)) +  
  labs(title = "Percentage of Child Count by Race in US & Oregon",
       x = "State",
       y = "Percentage",
       fill = "Race") +
  theme_minimal()

So, I kept experimenting.

ggplot(childcount1920USOR_long, 
       aes(x = race, 
           y = percent, 
           fill = state)) +
  geom_bar(stat = "identity", 
           position = position_dodge(width = 0.7)) +  
  scale_fill_viridis_d() +  
  scale_y_continuous(
    labels = percent_format(scale = 1), 
    expand = expansion(mult = c(0, 0.05))) +
  labs(title = "Percentage of Child Count by Race in US & Oregon",
       x = "Race",
       y = "Percentage",
       fill = "State") +
  geom_text(aes(label = scales::percent(percent / 100, accuracy = 0.1)), 
            position = position_dodge(width = 0.7), 
            vjust = -0.5,  
            color = "black", 
            size = 4) +
  theme_minimal(base_size = 14) +
  theme(legend.position = "bottom")

After making adjustments to show the data more clearly and accurately, here is the final version that shows the data best! Flipping the X and Y axes helps readers understand the categories more easily, and the data labels make it clearer to see the exact differences between the national data and Oregon’s data. I also assigned specific colors from the viridis color pallette for the national and Oregon’s data to maintain consistency in future visualizations.

viridis_colors <- c(
  "Oregon" = "#7ad151",               
  "US and Outlying Areas" = "#414487")

childcount1920USOR_long$race <- factor(childcount1920USOR_long$race, levels = sort(unique(childcount1920USOR_long$race)))
childcount1920USOR_long$race <- forcats::fct_rev(childcount1920USOR_long$race)

ggplot(childcount1920USOR_long, 
       aes(x = percent, 
           y = race, 
           fill = state)) + 
  geom_bar(stat = "identity",
           position = position_dodge(width = 0.7)) +  
  scale_fill_manual(
    values = viridis_colors, 
    breaks = c("US and Outlying Areas", "Oregon"),         
    labels = c("US and Outlying Areas", "Oregon")          
  ) +
  scale_x_continuous(
    labels = percent_format(scale = 1), 
    expand = expansion(mult = c(0, 0.05))) +
  labs(
    title = "Representations in EI",
    x = "Percentage",
    y = "Race",
    fill = NULL) +
  geom_text(aes(
    label = scales::percent(percent / 100, accuracy = 0.1)), 
    position = position_dodge(width = 0.7), 
    hjust = -0.2,  
    color = "black", 
    size = 4) +  
  theme_minimal(base_size = 14) +
  theme(
    legend.position = "bottom"
  )