Features:

Story Recipe: Using Census migration data to find out where young adults are moving

Where to find the data, how to explore it, and questions to ask to reproduce the story for your community

Posted on: August 2, 2022

A map showing that young adults who grow up in St. Louis tend to remain there. A recent data analysis found that 75% of people who lived in the area at age 16 still lived there at age 26. — Data from the Migration Patterns website shows that young adults who grow up in St. Louis tend to remain there.

The cliché about St. Louis is that upon first meeting you—no matter how old you are—people will always ask “Where did you go to high school?” Local high schools have their stereotypes, and the thought is where someone went to high school might reveal more about them than just what’s on their diploma. But the hidden assumption in the question is that they went to high school somewhere in St. Louis.

As a transplant, I’ve always been fascinated by this idea. So when I learned the Census was publishing some data that would show just where people were moving from and to, I couldn’t wait to dig in.

The story we found

Our reporting showed that indeed, people who were represented in the data largely stayed put in St. Louis: About 3 out of 4 who were here at age 16 were also here at age 26. This varied a bit by racial group, with the number a bit higher for Black residents, slightly lower for white residents and lower still for Asian and Hispanic residents. However—as the story notes—the population of Latinos and Asians has grown significantly in St. Louis since the studied time period, so it may not fully represent people living here now.

In addition to writing about the data, we chose to use two Sankey diagrams to show the population flow from and to St. Louis. These let the reader see people’s common origins and destinations, as well as the proportion of people who migrated to or from each place. To avoid making the charts too cluttered, we limited these to St. Louis itself, plus the 10 other most common origins or destinations.

How you can analyze the data

The data is available at https://migrationpatterns.org, using Census, HUD, and federal tax information to compare where young American adults lived at ages 16 and 26. (The analysis covers children born in 1984–1992: people who are now in their 30s.) There’s an interactive map if you want to explore the data or pull out a few top-line numbers quickly. But if you’re interested in digging in, scroll down to “Want to know more?” on that page, and click the “download the data” link.

You’ll get four files, plus a data dictionary. We primarily used the od_pooled.csv file, which has data for the population as a whole. If you’re interested in looking at the data segmented by race or income, you’ll need the other files. To see what things look like where you live:

1. Open the od_pooled.csv file in Excel.

This file lists pairs of “commuting zones”—the geography used in this analysis. It was a bit unfamiliar to me, but IPUMS has more info if you want to look into it further.

2. Filter the data for the Commuting Zone you’re interested in.

Let’s find out where people are migrating from. First filter for your state on the d_state_name column, then your commuting zone on the d_cz_name column.

How to do this: First turn on the filters by making sure a cell containing some data is selected. Then go to the Data tab and click the Filter button. You should see boxes with downward-pointing arrows in each cell of your header row. Click the box in the cell with d_state_name (This should be cell F1, if you started at A1). In the dialogue box that appears, uncheck the box that says select all, then scroll down and check the box for the state you’re interested in. You can also just type the name of the state you’re interested in in the Search box. Now filter the d_cz_name column by clicking the box with the arrow in it, and choosing the appropriate zone like you did with the state.

It might be helpful to use the interactive map to figure out the name of the zone you’re interested in. They’re often named for the largest city in them, but which counties are grouped together isn’t always intuitive. And beware, some of them do cross state lines.

3. Now you should have a list of different origin commuting zones where all the destinations are identical—the place you’re interested in.

You can create a new sheet called “to_[PLACE]” and copy and paste just these cells in.

4. Find the most common places where people moved from.

Now, working in your “to_[PLACE]” sheet, sort largest-to-smallest on the pr_o_d column.

How to do this: First highlight all your data by clicking a cell anywhere in the data and using COMMAND-A (CTRL-A). Now, on the Data tab, click the Sort button. In the dialogue box that pops up, make sure that “My data has headers” is checked. Under column, choose pr_o_d. Under order, choose Largest to smallest. Click OK.

This number represents the probability of someone being from the origin commuting zone, given that they are now living in the destination commuting zone. In other words, if you picked a 26-year-old at random from the place we’re analyzing here, what is the probability they were living in various other places at age 16? You can multiply these numbers by 100 and use them as percentages to say something like “X% of people who live in [destination] grew up in [origin].”

5. Find the most common places where people moved to.

To find out where people who lived in your area at age 16 were a decade later, go back to the original od_pooled sheet and set up a new filter, this time on the o_state_name and o_cz_name columns.

How to do this: First clear your existing filter by going to the Data tab and clicking the small Clear button next to the Filter button. Then click the box with the downward-pointing arrow in the cell with o_state_name (This should be cell C1, if you started at A1). In the dialogue box that appears, uncheck the box that says select all, then scroll down and check the box for the state you’re interested in. You can also just type the name of the state you’re interested in in the Search box. Now filter the o_cz_name column (probably B1) by clicking the box with the arrow in it, and choosing the appropriate zone like you did with the state. You can create a new sheet called “from_[PLACE]” and copy and paste just these cells into it.

This time you’ll want to sort on the pr_d_o column. This represents the probability of someone living in the destination zone given that they grew up in the origin zone. In other words, if you picked someone at random who lived in your area at age 16, what is the probability they wound up in various other places at age 26? You can multiply these numbers by 100 and use them as percentages to say something like “X% of people who grew up in [origin] lived in [destination] at age 26.”

6. If you want to analyze where people moved to/from at the state level, you can create a pivot table.

How to do this: Choose the appropriate sheet (“from_[PLACE]” or “to_[PLACE]”) for what you want to know by clicking the tab for that sheet. Highlight all the data in the sheet by clicking a cell anywhere in the data and using COMMAND-A (CTRL-A). Now go to the Insert tab and choose PivotTable. In the dialogue box that pops up, choose OK using the default options. This will create a new sheet for your PivotTable. Name this sheet something like “to_[PLACE]_pivot” or “from_[PLACE]_pivot”.

Which fields you need to use in your pivot table will depend on whether you used the “to” or “from” sheet. If you chose the “to_[PLACE]” sheet and want to aggregate by origin state, in the “PivotTable Fields” builder on the right, drag the o_state_name field into the box labeled “Rows”. Then drag the pr_o_d field into the box labeled “Values”. It should change to say “sum of pr_o_d”, which is what we want. You can sort this column by clicking one of the number values, then going to the Data tab and clicking the button with a letter “Z” on top of a letter “A” next to an arrow pointing down.

These numbers represent the probability of someone being from the origin state given that they are now living in the destination commuting zone. You can multiply the number by 100 and use it as a percentage to say something like “X% of the people who live in [destination] grew up in [origin state].

If you chose the “from_[PLACE]” sheet, and want to aggregate by destination state, put “d_state_name” in the rows and “pr_d_o” in the values.

Note: A quick sanity check that you matched up the right fields here is that the value fields for all the states should add up to something very close to 1.

Where to look for your story

One thing I immediately wanted to look at with this data is whether it’s true that a lot of people wind up in St. Louis as young adults after growing up here. It is, though this is also true of many other places, especially larger ones.

It is also interesting to see where people are coming from and where they are going. St. Louis draws mostly from smaller surrounding communities, rather than far away. But for people who left (and didn’t return), there are larger cities like Denver, New York, and Dallas in the list.

Other opportunities would be to dig into the data classified by race, income, or both, available in the interactive tool as well as the downloadable data. For example, for all people who grew up in St. Louis, Atlanta wasn’t in the top ten of destinations. But for Black people who grew up in St. Louis, Atlanta was the third most-likely place to be at age 26. And for people whose parents were in the top 20% of income, they were twice as likely to wind up in Chicago or New York as the average St. Louisan.

One thing to be aware of is that the data files for income/race/both—basically everything other than the “pooled” data file that we used in the example above—are larger than Excel can handle by default. So if you’re using Excel you may need to prune them in a text editor or by using a tool like csvkit before importing them.

Find more step-by-step data story recipes like this one. If you have questions about a story you’re working on, our free peer data review program is here to help.

Programs like these are part of the OpenNews community care package. If you’re using this story recipe, please let us know — we’d love to promote your work! If you’ve got a story recipe idea, we’d love to hear about it. Drop us a line at source@opennews.org.

Credits

Brent Jones

Brent Jones is St. Louis Public Radio’s data visual specialist. He does data analysis and visualization as well as produces digital special projects and newsroom tools. He’s also the newsroom’s drone pilot. Formerly, Brent worked at the St. Louis Beacon nonprofit news site from its inception in 2008 until it merged with the radio station in 2013.
- St. Louis Public Radio
- @brentajones
Eric Schmid

Eric Schmid covers Economic Development for St. Louis Public Radio. He’s primarily focused on examining policies and ideas to drive population and business growth throughout the St. Louis region. Eric came to the station through Report for America in 2019 and was tasked to develop STLPR’s coverage east of the Mississippi. Before joining St. Louis Public Radio, Eric held internships at Fox News Channel, NPR-affiliate WSHU Public Radio and AccuWeather. He graduated from Stony Brook University in New York with a degree in Journalism in 2018.
- St. Louis Public Radio
- @EricDSchmid

Story Recipe: Using Census migration data to find out where young adults are moving

Where to find the data, how to explore it, and questions to ask to reproduce the story for your community

The story we found

How you can analyze the data

Where to look for your story

Credits

Brent Jones

Eric Schmid

From our Archives:

How to tell good LGBTQ+ stories with bad data

Story Recipe: Using Census migration data to find out where young adults are moving

Where to find the data, how to explore it, and questions to ask to reproduce the story for your community

The story we found

How you can analyze the data

Where to look for your story

Credits

Brent Jones

Eric Schmid

Recently

How to tell good LGBTQ+ stories with bad data

7 tips for data-driven journalism about LGBTQ+ communities

Fact-checking in 2024? Five tools to help with research and promotion

Search this site

From our Archives:

How to tell good LGBTQ+ stories with bad data