Contribute Try STAT+ Today

Medical students around the country cheered and jeered this week’s announcement that the results of a much-feared compulsory exam known as Step 1 would cease to be reported with a three-digit score but would instead become pass/fail as early as 2022.

The announcement by the sponsors of the U.S. Medical Licensing Examination (USMLE) said the change was made in part to “address concerns about Step 1 scores impacting student well-being and medical education.”

Students usually take the seven-hour exam after the second year of medical school. It is the first in a series of three exams that doctors-to-be must pass to become licensed physicians in the U.S. Initially designed to gauge how well a medical student can apply scientific concepts to the practice of medicine, in recent years the scores were used by some residency programs to screen potential candidates.


Opinions are divided on reporting scores: Some think it makes sense and moving to pass/fail would be “root rot.” Others point out how striving for scores induced egregious stress and has changed how medicine is being taught.

Social media platforms lit up with the news. Within hours, submissions for First Opinion had begun arriving. Here are excerpts from four of them:


LaShyra Nolen: This could reinforce the hierarchy among med schools

Anna Goshua: Will making Step 1 pass/fail merely kick the stress can down the road?

Orly Nadell Farber: We need to realign learning with patient care

Max Jordan Nguemeni Tiako: A good move for those underrepresented in medicine

LaShyra Nolen

As a first-year medical student, I have already witnessed how the looming pressure of Step 1 disrupts the learning environment and leads to unbearable stress and anxiety. Some students elect to disengage with coursework and instead choose to commit arbitrary, clinically irrelevant facts to memory simply because they will be covered on the exam. Others spend hundreds of dollars on extra prep materials they believe will give them an edge on the exam. These costly test prep materials are yet another barrier for students from low-income backgrounds.

The exam also changes the structure of medical education. I often hear students use phrases like “teach to Step” when describing how their medical school curriculum is designed to maximize their performance on the exam. I also commonly hear my professors say things like, “I rarely see this with my patients, but just remember this detail for Step.”

However, I do worry that making the test pass/fail could reinforce the hierarchy that exists among medical schools. My friends at so-called low-tiered medical schools, international medical schools, and schools of osteopathic medicine have expressed great concern that they won’t be able to use their Step 1 scores as a way to make themselves more competitive for residency programs. A pass/fail system privileges students like me who attend higher ranked medical schools. Therefore, it will be imperative that residency directors institute fair, objective ways to avoid the biases of the existing system. While it will be challenging, it is an opportunity to reevaluate the validity of medical school rankings and perhaps redefine what qualities make a good doctor in the 21st century.

Most important, the change to pass/fail does nothing to resolve the racial and systemic inequities entrenched in the residency application process. With many medical schools’ transitions to pass/fail curricula, Step 1 was the last major objective measure that residency programs could use to evaluate candidates. When Step 1 scores disappear, more emphasis will be placed on research experience, publication, and subjective evaluations from students’ preceptors. Studies have shown that students of color tend to receive lower scores on subjective evaluations during their clinical years. For the Step 1 change to result in true improvement, we must address how these biases are baked into the evaluations of students from marginalized backgrounds.

LaShyra Nolen is a first-year student at Harvard Medical School.

Anna Goshua

As a second-year medical student gearing up to dedicate my time to studying for Step 1 this spring, the news was bittersweet: I wish that this change, with the potential to lead to needed reforms of medical education, had happened earlier.

In becoming pass/fail, Step 1 is reverting to what it was meant to be from the get-go: a minimum competency assessment. There has never been a sound basis on which to conclude that a score of 240 means that a student will be a better clinician than one who scores 220. The use of Step 1 has also been problematic since it disadvantages groups of test-takers, including students who are underrepresented in medicine and women.

Despite the positives, this decision to make the test pass/fail is not popular with medical students. In a 2011 survey, only 26% of medical students agreed that Step 1 should be pass/fail; in the 2019 Invitational Conference on USMLE Scoring assessment, under half of respondents said they would support changes to numeric Step 1 scoring. Why? The removal of Step 1 scores, which have been continually ranked at the top of residency program directors’ priorities in evaluating candidates, begs the question of what will replace them. It seems natural to think that the search for another objective metric might lead residency program admissions committees to shift emphasis to Step 2.

But since Step 2 is typically taken in July at the end of the third year of med school, a poor result could be disastrous. Students who perform poorly on the exam would learn they aren’t competitive for their residencies of choice just a month or so before applications are due. These students would also have already arranged their sub-internships for fourth year.

Student wellness was given as one reason for making Step 1 pass/fail. How do we ensure that the stress it generates during the first two years of medical school isn’t simply being redirected toward the clinical years and Step 2? And if there is a move to make that test pass/fail as well, what other metrics should be used to equitably and accurately appraise a candidate’s abilities?

Anna Goshua is a second-year student at Stanford University School of Medicine.

Orly Nadell Farber

It’s been seven months since I took the Step 1 exam, and I am still hesitant to acknowledge just how unwell I felt while studying for it. But in the wake of the announcement, as my phone lit up with texts from classmates and my Twitter feed took a sharp turn from the Democratic primaries to the USMLE, I considered whether my “well-being” would have been better under a pass/fail rubric.

When I think back to the three months that I spent solely preparing for this one-shot, multiple-choice test, I see both the best and worst versions of myself.

During those months, I was the most determined and disciplined I have ever been. Stationed in a library with a friend sitting in a chair diagonal to mine, I studied from morning to night six days a week. My only breaks were for meals and coffee refills, phone calls home, and my much-savored run around the Stanford foothills each evening. During my final month of studying, even these became opportunities to cram clinical facts as I listened to lectures on cardiac disease while trotting up and down hot pavement and yellowing grass knolls.

My friend and I shared a study tactic that bonded us like a dark and embarrassing secret: We both recorded the time we spent studying. I used a handheld stopwatch, hitting the pause button with every bathroom break, text message, or internet scroll; she used a timer on her laptop. In just shy of 12 weeks, I had studied for 674 hours, 27 minutes, and 25 seconds. I was diligent and dedicated — and also a wreck.

The hardest part for me to recall is how often I cried, bursting into tears after scoring poorly on my weekly four-hour practice exams, or on days that I simply felt too fatigued to absorb any material. Studying for Step 1 brought all of my academic insecurities out to play.

And then, in the blink of an eye, I took the exam and moved on with my life. I started my clinical years of medical school, rotating with different specialties in the hospital. These rotations were the perfect antidote to my Step 1 experience. The diseases I had studied in textbooks jumped off the page and manifested as real people with real illnesses that I now get to help diagnose and treat.

I continue to prepare for tests — Step 2 is looming on the horizon — but I now learn alongside practicing medicine. As such, I now study with an excitement and curiosity I had rarely felt in my months of library isolation during Step 1 prep.

Perhaps Step 1 is a bit like giving birth: It was exceedingly difficult at the time, but afterward I fell in love with medicine and instantly forgot much of the pain I endured to get here.

I can’t predict how the change to reporting Step 1 scores will affect medical trainees, but my hope is that it will help us realign our learning with patient care — for our patients’ health and well-being as much as for our own.

Orly Nadell Farber is a third-year student at Stanford University School of Medicine and a former STAT intern.

Max Jordan Nguemeni Tiako

Reporting the scores of the Step 1 exam as pass/fail is step in the right direction that I believe will contribute to alleviating the burden of standardized testing on medical students, especially those like me who are underrepresented in medicine.

I spent nine weeks reviewing 800 pages of minute details of basic science. I answered 6,000 practice questions. It was a time of heightened anxiety, social isolation, and misery. I had hit rough patches in medical school before the test, but I had never before doubted whether I could get through medical school like I did during this period. I worried I might fail and be exposed as a fraud, proving to whoever might have thought it before that I didn’t deserve a seat at Yale as a Black medical student. I began to regret my heavy involvement in extracurricular activities and service to my medical school.

Until Step 1 reared its ugly head, I had been immersed in a student-led committee focused on diversity, inclusion, and social justice at Yale. We evaluated the curriculum and lobbied the administration for changes in education, the admissions process, resource allocation for research, and more. I even started a podcast focusing on health disparities.

All that was possible because the Yale system deemphasizes the importance of tests and grades during medical education. I studied because I wanted to become the best doctor for my future patients, not because I was preparing for a test.

Then the reality of Step 1 hit me like a ton of bricks. If I wanted to try for a competitive specialty like dermatology, neurosurgery, or ophthalmology, it was either buckle down for the test or write off those dreams.

Using Step 1 scores to screen residency applications puts students who are underrepresented in medicine at a disadvantage. For instance, a 2019 study of orthopedic surgery residency programs (the specialty with the lowest percentage of underrepresented students among its trainees) showed that between 2005 and 2014, Black and Latinx applicants were accepted into residency programs at a significantly lower rate (61%) than white applicants (71%). An earlier study in internal medicine showed that when Step 1 scores were used to screen applicants for interviews, a significantly greater proportion of Black students were refused interviews.

The reality is that Step 1 is neither precise nor does it predict students’ performance as residents beyond a certain threshold. With a standard error of eight points, two applicants with scores as far as 15 points apart may not be meaningfully different and yet several programs use singular cutoff points as screening tools.

While underrepresented students are more likely to go into primary care, there is no evidence that we express a preference for primary care early in medical training. It is likely that the chances of successfully matching into specialties with higher average Step 1 scores continue to be affected by arbitrary cut-off scores, discouraging students from considering those specialties.

Max Jordan Nguemeni Tiako is a fourth-year student at Yale School of Medicine.

  • “scores as far as 15 points apart may not be meaningfully different and yet several programs use singular cutoff points as screening tools”

    So the solution is to make a 1 point difference decisive?

  • We have the same problem in PhD school. Everyone is dropping the GRE requirement, so there are fewer ways for students from institutions with low resources to distinguish themselves. People whose parents paid for their $200,000 college tuition congratulate themselves on dismantling the GRE because “poor kids can’t afford GRE tutoring”. In reality, some of us are cracking the books and doing quite well on our own, having little trouble coming up with the $120 test fee. The Harvard diploma, 4 years of research experience with a Nobel laureate, unpaid internships, and luxury volunteer work trips abroad are things I can’t compete with while working my way through college. A 2 hr test I can deal with, perhaps better than you!

  • As an osteopathic medical student, I am ambivalent about this change. On one hand, I am concerned about how DO students in the future will show how they rank against students from other schools academically when applying for residency (which is one of the biggest reasons I took Step). I also assume Step 2 will become the natural replacement for Step 1. While I’ve been told Step 2 is a better correlate for how students will actually perform as residents, I am glad to have finished all the hard work in second year when I had a lot of time to just study, rather than having to worry about it now when I’m already worried about applying to residency and getting letters of recommendation.

    On the other hand, I do think it’s true that in rotations I think to myself sometimes, “I don’t know if my score is good enough to be this kind of doctor.” I have heard similar thoughts from other students in my class. It does affect how we think about what kind of medicine to pursue, because we know some specialties may be essentially off limits. For myself, I am interested in a less competitive field, so it doesn’t affect me as much as it may affect others. I also know that many students who were interested in more competitive specialties knew beforehand when they were studying for boards and aimed for scores competitive for those specialties.

    I’m not sure how this change will affect others. I’m sure many people will feel relieved. Others, like myself, may feel uncertain and concerned about the implications of this change.

  • You know what would have been nice for this piece, including an opinion on someone from a lower-tiered school, instead of the top tiers you included. All Harvard, Stanford, and Yale. Come one.

    • Agree completely. Pick students from someplace other than the top tiers to gauge how they may feel. Loved the insights from the students here – thank you for sharing! – but with a daughter headed to DO school soon, I’d love to get some varied perspectives on this topic.

  • There is no doubt that this is a monumental change for medical education. However, wouldn’t this opinionated piece be better if the students interviewed were not all from top tier medical schools where the score of their step 1 means less than other students being they will leverage the name of their institution to get where they want to go?!? As with most things, it is more about who you know than what you know. Taking away three digit scoring only cements this fact and will place more stress on step 2 CK performance.

  • Step 1 reporting as “pass/fail“ is a silly concept. Statistically, there is a huge range of scores, granted with varying degrees of accuracy as the representation of student’s base of knowledge, but nonetheless indicating those with higher mastery of the material than others. It is completely legitimate to have some way of recording this hierarchy of achievement. Perhaps reporting the scores as quintiles, first through fifth, could be the sort of mechanism to show this hierarchy without ranking students by their actual numeric score. The bottom quintile, or whatever appropriate measure, could be used to designate those who scored “unsatisfactory“ which would be read as “fail.”

  • Pass/fail may be less stressful, but where is the evidence that a stress-free medical education makes for better doctors?

  • Is there any evidence students will study less given it’s pass fail now, or that stress will go down?

  • This pass/fail concept dilutes healthy competition and breeds mediocracy. Everything in life is competitive: higher education entrance, job search, landing a partner, succeeding at work, etc etc. Why would Med School be any weaker than what so much in life is all about?

    • I can agree to the concept of healthy competition, however when there are true disparities that exist for underrepresented populations it becomes more about equity and less about equality. Until systematical changes begin to truly give fair opportunities, I think it is necessary that changes be made.

  • What about students with results near the cutoff point between passing and failure? A binary pass/fail result is only reasonable if any given iteration of the Step I exam is a *perfect* measurement of every student’s achievement. That’s never been the case for any exam. In reality, all exams carry a degree of error, reported statistically as the standard error of measurement (SEM). With a binary result, students whose true performance level is anywhere near the cutoff point will get ironclad, apparently definitive Pass or Fail results when their result is likely to be different in any other iteration of the exam. How is this expression of random chance more acceptable? How will residency program directors know which applying students’ results are more representative of random chance than achievement? It isn’t, and they can’t.

    If we wish to have categorized rather than numerical Step I results, the only sensible option is to report (at minimum) a third category: students whose results fall within ±2 SEM of the cutoff point as “borderline” or “marginal”, and then more reliable Pass or Fail results for students outside that range. A binary system is hubris born of false expectations about exam precision. In other words, for some (unknown) proportion of students, a lie.

Comments are closed.