My experience

I’ve been working with educational data for over 12 years now and I’ve noticed a consistently strong relationship between the percentage of students in a school who are eligible for free and reduced price lunch (FRL) and the percentage of students who are proficient on a state standardized assessment. This is one of the reasons I caution district leaders about setting goals (either at the school or district-level) solely based on proficiency. I’m not sure how many people in education are familiar with this phenomenon so I thought I’d illustrate it in this blog post using real data from the Minnesota Department of Education. While working on this post, I discovered something that surprised me.

The example

I downloaded enrollment and assessment data from the Minnesota Department of Education’s website (here) . Specifically, I downloaded enrollment data (here) and MCA Math assessment data (here) for the 2017/18 school year. I then focused on schools in “public operating elementary & secondary independent districts” (district type = 1). After that I combined the data across only the MCA-eligible grades for math (i.e. grades 3 through 8, and 11). Finally, I correlated the percentage of students eligible for FRL at each school, with the percentage of students identified as proficient at that school.

Before getting into the results, keep in mind that a correlation can range between -1 and 1. When the correlation between two variables (e.g. percentage of students eligible for FRL and percentage of students proficient on the MCA math assessment) is -1, it means that there is a perfect negative correlation so that the values for the 1st and 2nd variable all fall exactly on a diagonal line that decreases from left to right. The closer a correlation is to 0 (e.g. -0.05, or 0.05), the less of a relationship both variables have with one another. When the correlation between two variables is 1, it means that there is a perfect positive correlation so that the values for the 1st and 2nd variable all fall exactly on a diagonal line that increases from left to right. Whether the correlation is -1 or 1, it means that you can perfectly predict the value of one variable if you know the value of the other variable (e.g. if the correlation between math proficiency and percentage of students eligible for FRL is -1, it means you can perfectly predict math proficiency at a school if you know the percentage of students eligible for FRL at that school). As a correlation gets closer to -1 or 1, the better you can predict one variable’s value if you know the other variable’s value at that school. Here are examples of graphs showing (from left to right) two variables with a perfect positive correlation, no correlation, and a perfect negative correlation.

My first step was to look at the data for all schools in the files I gathered and cleaned. The results from this more inclusive analysis are below.

All public schools with MCA math data

The first graph is based on 1102 schools and shows a negative correlation of -0.63.

Notice the general negative trend. As the percentage of students who are eligible for FRL increases, the percentage of students who met the standards on the MCA math assessment decreases. Although a negative correlation of -0.63 is quite strong, there are several schools that have relatively low proficiency rates while also having relatively low percentages of students eligible for free/reduced price lunch. The fact that many dots/schools deviate from the diagonal line suggests there’s some variability and that some schools buck the general negative trend (i.e. some schools have either relatively high proficiency rates despite also having high percentages of students eligible for FRL, or have relatively low proficiency rates despite also having low percentages of students eligible for FRL).

Public schools with at least 100 students with MCA math data

After the above analysis, I wanted to see what would happen if I removed schools with “too few” students. For example, some of the schools in the above graph had as few as 11 students who took the MCA assessment and percentages based on such a small number are likely to bounce around more dramatically from year to year. So the next step was to select only schools that had at least 100 students with MCA math assessment data (across grades 3-8, and 11). These results are below.

The graph below is based on 925 schools and shows a negative correlation of -0.66.

After removing schools with less than 100 students who took the MCA math assessment, the remaining dots/schools get closer to the diagonal line. This is consistent with the fact that the negative correlation got stronger (i.e. from -0.63 to -0.66). I expected to see this because I thought, once I remove schools whose proficiency percentages were based on very small numbers of students, the remaining schools should better follow the general pattern of a negative correlation. However, I wanted to further explore the effect of reducing the sample of schools to include only those schools with a larger and larger number of students who took the MCA math assessment. The next two sections present what I found after limiting the sample of schools to those with at least 200 students with MCA data and then those with at least 500 students with MCA data.

Public schools with at least 200 students with MCA math data

The graph below is based on 644 schools and shows a negative correlation of -0.74.

As before, restricting the sample to schools with even more students with MCA math assessment data resulted in an even more negative correlation between the percentage of students eligible for FRL and the percentage of students who were proficient on the MCA math assessment (i.e. from -0.66 to -0.74).

Public schools with at least 500 students with MCA math data

Finally, I restricted the sample of schools to just those with at least 500 students who took the MCA math assessment. The resulting graph (below) is based on 141 schools and shows a negative correlation of -0.84.

Again, the negative correlation between FRL and MCA math proficiency at the school level increased in strength. This time from -0.74 to -0.84. This negative correlation is quite strong. It suggests that, for these schools, about 71% (i.e. -0.84^2 = 0.71) of the variation in school math proficiency rates is explained by the percentage of students eligible for FRL.

Are these results surprising? If you’re an educator or work as an administrator in education, what do you make of this pattern? Below is a link to a dashboard of the same data for you to interact with.

https://public.tableau.com/shared/TWT7HSZM2?:display_count=yes&:origin=viz_share_link