The Florida Standards Assessment (FSA) was launched in 2015 as part of the statewide accountability system, as a replacement for its predecessor, the FCAT. The FSA is mandated by the state for students in grades 3 – 10, and scores are used for high stakes purposes such as to evaluate teachers, administrators, schools and districts. Other high stake uses of the FSA include potential student retention and graduation decisions. There is, however, little to no agreed upon evidence that the test is an accurate measure of quality, leading to high stakes decisions (including teacher and administrator job losses and school closings) being made based on erroneous information.
The Validity of Standardized Tests
Standardized tests, in general, are not reliable sources for such high stakes decisions. Decades of research proves this:
- Why Standardized Tests Don’t Measure Education Quality, by W. James Popham, UCLA Emeritus Professor.
- A study from MIT, Harvard and Brown University indicated that high standardized test scores do not translate to better cognition.
- Research from a UCLA professor shows why standardized tests don’t measure educational quality.
- A Stanford University study shows how stereotypes prevent standardized tests from accurately measuring student performance.
- This article references a whitepaper from the Central Florida School Board Coalition about the misuse of standardized tests in Florida, including examples of how changing cut scores and introducing new calculations such as learning gains significantly affects outcomes.
- Research conducted by a Professor at Arizona State University with a PhD in Educational Psychology focusing on testing statistics and research. His interpretation is that there are many unaddressed reasons that standardized tests are not valid, and therefore it is dangerous to place a large amount of weight or stakes on the tests, especially if they are not corroborated by other more valid measures such as class grades.
- Research from a Professor at Northwestern University, saying that public policy makers are placing too much emphasis on standardized tests as a valid measure of student performance, noting that many of the technical reasons the tests are not valid are often ignored when decisions are made about their use.
- Research from Washington State University tying parental income to standardized test scores.
- Researchers use census data to accurately predict test results, indicating a clear connection to factors outside of school control.
- A study conducted by several faculty members at U.S. universities indicating that standardized test scores have gender and ethnicity bias.
- A study from a professor at the University of Illinois concluded that college readiness decreases when schools focus on test scores.
- A nine-year study by the National Research Council concluded that the emphasis on testing does not significantly increase learning, but actually causes harm.
- Research from professors from Bates University indicates that SAT and ACT tests are not valid indicators of future success.
- A political scientist at the University of Massachusetts presents research about how the increase in standardized testing is driving parents away from their schools.
There is also mounting research that standardized tests are not an accurate measurement of teacher quality.
The Validity of the FSA
If test scores are being used for high stakes decision, it stands to reason that the test itself should be proven valid. In 2015, the Florida Commissioner of Education testified that the FSA had been validated in Utah, where many of the test bank questions were created. When asked to deliver reports from that validation, no such reports surfaced. Legislators rightfully ordered an independent verification of the FSA in order to test its validity and ensure it is being used for its accurate purposes. Although not truly independent, (the company selected to conduct the study, Alpine Testing Solutions partnered with EdCount. EdCount is a partner of the original test creator, AIR, and the project team included many AIR employees), the study did bring forth several material discrepancies questioning the validity of the test [view the full report here:]:
- Although one of the purported purposes of the Florida Standards (which the test is designed to measure) is to increase rigor in schools, the validity study noted that the administration of the FSA does not meet that qualification: “The evaluation team can reasonably state that the spring 2015 administration of the [Florida Standards Assessments] did not meet the normal rigor and standardization expected with a high-stakes assessment program like the FSA.” (Validity report, page 14)
- The study confirmed that the questions on the FSA are aligned to Utah standards, which may vary significantly from those in Florida. According to the executive summary: “the items were originally written to measure the Utah standards rather than the Florida standards. While alignment to Florida standards was confirmed for the majority of items reviewed via the item review study, many were not confirmed, usually because these items focused on slightly different content within the same anchor standards.” (Validity report, page 47, emphasis added)
- The report recommended that results from the computer-based FSA not be used as a sole factor in student-level consequences, such as retention or remediation, and yet it is being used for this purpose in many cases. In fact, there is no mention in the study that the test is valid for its intended purpose, which is to assist teachers and improve learning. (Validity report, page 20)
- The study was intended to review a full range of FSA tests, including 3rd through 10th grade English Language Arts (ELA), 3rd through 8th grade Math and several high school tests including Algebra 1 and 2. The report, however, left out 11 of the 17 exams the study was supposed to measure. (Validity report, page 7)
- The report itself indicates significant shortcoming of the study, notably: “There are some notable exceptions to the breadth of our conclusion for this study. Specifically, evidence was not available at the time of this study to be able to evaluate evidence of criterion, construct, and consequential validity. These are areas where more comprehensive studies have yet to be completed. Classification accuracy and consistency were not available as part of this review because achievement standards have not yet been set for the FSA.” (Validity study, page 19, emphasis added)
- Following are excerpts from an interview with Andrew Wiley, Chief Psychometrician for Alpine who worked on the validity study: “Within that report, Wiley said, the reviewers did find enough data to support using the test results at an aggregate level, such as school grades. However, he cautioned, many unknowns about the impacts to individual schools and students exist. That means more study would be appropriate, as the report suggests”….”We need to talk about how the aggregate scores can be used,” he said. “It should be done with a lot of caution.” Wiley also shied from the idea of calling the test “valid.” “Validity is not a simple yes-no,” he said.
Given the research, in addition to anecdotal evidence from parents across the state (such as straight A, honor roll students being put in remedial classes because of an FSA score), the FSA should not be used for high stakes decisions.