Relative Size of Ivy League


Some data related to an article in The Washington Post about how press coverage of the Ivy League is disproportionate to the size of the Ivy League compared with the country at large.


John Goldin


December 13, 2023

In a recent article Philip Bump laid out some statistics to emphasize that press attention to the Ivy League is way out of proportion to the relative size of the Ivy League in the country as a whole. I have no disagreement with the point he’s making.

The purpose of this post is to provide some additional data relevant to this point. In the article he links to a summary table by the National Center for Educational Statistics (NCES). With a bit of digging one can get similar data for individual schools. Schools submit data to NCES as part of the Integrated Post-secondary Education Data System (IPEDS). There are surveys for enrollment, but for this post I will focus on IPEDS Completions, which includes counts of the number of bachelor’s degrees awarded by each school and the number of students receiving those degrees.

NCES offers a tool that allows one to export the detail data in a custom dataset. It can be a bit painful to use that tool. Fortunately a colleague of my old office had already exported the NCES Completions data for each year from 2011 through 2022. The dataset has data for each individual school so it’s easy to create a separate count for a subset that includes the Ivy League.

Using that dataset I identified the Ivy Plus schools. A side note: as someone who worked in the bowels of an Ivy League administration for 35+ years I was never asked for comparison data for just the eight schools that make up the Ivy League. We always added at least Stanford and MIT to the list. I’ve included those schools here as part of Ivy+. (Sometimes people identify Ivy+ as also including Chicago, Duke, Johns Hopkins, and perhaps some other private universities. The boundary can be blurry, but unless one is only talking about athletics, Stanford and MIT are always included as part of Ivy+.) Any shenanigans, innocent or otherwise, at MIT or Stanford are just as likely to end up in the New York Times as those at Harvard or Yale or other Ivy schools.

On to the data.

The table below shows the number of students who received bachelor’s degrees in each year from 2011 through 2020.

R code that creates the summary table
packages <-  c("tidyverse", "arrow", "gt")
my_lib <- function(x) {
  suppressPackageStartupMessages(library(x, character.only = TRUE))
invisible(lapply(packages, my_lib))

# open all parquet files.
# the parquet files were created by a colleague from CSV files downloaded from NCES
ds <- open_dataset('data/parquet/')

ivyplus_schools <- tibble::tribble(
  ~UNITID,           ~school,
  217156L,           "Brown",
  190150L,        "Columbia",
  190415L,         "Cornell",
  182670L,       "Dartmouth",
  166027L,         "Harvard",
  166683L,             "MIT",
  186131L,       "Princeton",
  243744L,        "Stanford",
  # 144050L, "Univ of Chicago",
  215062L,           "UPenn",
  130794L,            "Yale"
ivyplus_id <- ivyplus_schools$UNITID

degtotal <- ds |>
  filter( AWLEVEL == 5, CIPCODE == 99, MAJORNUM == 1) |>
  # need MAJORNUM == 1 to count students rather than degrees. Students can have multiple bachelors degrees. Only first is counted.
  # AWLEVEL == 5 is bachelors degrees
  # CIPCODE == 99 is total of all fields of study
  ) |>
  summarise(degrees = sum(CTOTALT)) |>
  collect() |>
  mutate(ivyplus = ifelse(UNITID %in% ivyplus_id, "Ivy+", "Other")) |>
  group_by(ipeds_year, ivyplus) |>
  summarise(degrees = sum(degrees), .groups = "drop")

# use pivot_longer to calculate the percentage of Ivy+ degrees
degtotal <- degtotal |>
  pivot_wider(id_cols = ipeds_year, names_from = ivyplus, values_from = degrees) |>
  mutate(ivyplus_pct = `Ivy+` / (`Ivy+` + Other))

gt(degtotal |> ungroup()) |>
  title = "Count of Students Awarded Bachelor's Degrees",
  subtitle = "Ivy+ vs. Other Institutions") |>
    source_note = "Source: IPEDS Completions."
  ) |>
    ipeds_year = "Year",
    `Ivy+` = "Ivy+",
    Other = "Other",
    ivyplus_pct = "Ivy+ %") |>
    columns = c(`Ivy+`, Other),
    rows = everything(),
    use_seps = TRUE,
    accounting = FALSE,
    scale_by = 1,
    suffixing = FALSE,
    pattern = "{x}",
    sep_mark = ",",
    force_sign = FALSE,
    system = c("intl", "ind"),
    locale = NULL
    ipeds_year = "Year",
    `Ivy+` = "Ivy+",
    Other = "Other",
    ivyplus_pct = "Ivy+ %") |>
    columns = ivyplus_pct,
    rows = everything(),
    decimals = 2,
    drop_trailing_zeros = FALSE,
    drop_trailing_dec_mark = TRUE,
    scale_values = TRUE,
    use_seps = TRUE,
    accounting = FALSE,
    pattern = "{x}",
    sep_mark = ",",
    dec_mark = ".",
    force_sign = FALSE,
    placement = "right",
    incl_space = FALSE,
    system = c("intl", "ind"),
    locale = NULL
Count of Students Awarded Bachelor's Degrees
Ivy+ vs. Other Institutions
Year Ivy+ Other Ivy+ %
2011 18,022 1,717,601 1.04%
2012 18,188 1,796,452 1.00%
2013 18,243 1,845,244 0.98%
2014 18,224 1,874,955 0.96%
2015 18,558 1,899,823 0.97%
2016 18,943 1,925,460 0.97%
2017 18,385 1,962,235 0.93%
2018 18,555 1,989,728 0.92%
2019 19,160 2,016,917 0.94%
2020 19,152 2,041,656 0.93%
2021 17,840 2,071,663 0.85%
2022 18,662 2,017,916 0.92%
Source: IPEDS Completions.

Ivy+ schools graduate about 1% of US bachelor’s degrees. And of course holders of bachelor’s degrees are only a subset of the total US population.

Note: The Washington Post article included a reference to an NCES summary table. That table reports a total for 2019-2020 of 2,060,808 bachelor’s degrees. In the table I show above, the total for 2019-2020 is 1.1% fewer. I don’t have an explanation for the discrepancy.

I compared the total count of bachelor’s degrees for Yale reported in the IPEDS dataset used here with the totals reported on the website for the Yale Office of Institutional Research. They agreed exactly. That’s not surprising because that office compiles the data that is submitted to IPEDS.