Psychologist logo
Eyes looking at the viewer
Autism, Methods and statistics, Research

The ‘Reading the Mind in the Eyes Test’ may be less robust than we thought

A new paper assesses the widely-used tool.

04 June 2025

By Emma Young

Share this page

The Reading the Mind in the Eyes Test (RMET) has been used in more than 1,500 studies, translated into at least 39 languages and is endorsed by the US National Institutes of Mental Health as a "current best option" for assessing a person's ability to understand mental states, write the authors of a recent paper in the journal Assessment.

However, concerns have been raised recently over the validity of the test. Wendy C. Higgins at the University of Melbourne and colleagues now present the results of a further analysis that, they say, casts doubt over the test's usefulness. "Our findings contribute to a growing body of evidence questioning the reliability and validity of RMET scores," they argue.

The test involves viewing 36 black-and-white photographs of just the eye region of other people, and each time identifying that person's state of mind, from four options. Numerous studies have found that people with autism spectrum disorder perform less well on it than neurotypical people, and it is widely used in the process of diagnosing Autistic Spectrum Disorder.

Higgins and her colleagues analysed nine existing datasets of RMET scores, all on non-clinical samples — that is, non-autistic people. The sample sizes ranged from 558 to 9,267. Most of the participants were white, and most were women.

The team used statistical techniques to ask a variety of questions of the data. A key one was: is this test actually assessing a single ability — to read someone's mental state? The answer to this question, they report, was, 'no'. "A single factor model does not account well for REMT performance in any of the datasets," they write.

As part of their investigation, they looked at whether, within individual datasets, the participants' responses met the criteria stipulated when the test was originally validated that at least 50% should select the 'correct' target response, and no more than 25% should select the same incorrect response.

They found that in four of the datasets, at least one item failed the first criterion. In all six in which the team were able to assess the second criterion, between two and four items failed. Five items failed at least one or both criteria across multiple datasets.

When it came to the individual items, there were eight 'easy' ones, which more than three quarters of the participants got right in each of the nine datasets. However, for the other items, there were sometimes substantial levels of disagreement between participants about what the correct answer should be.

When there was consensus, that could often have been an artefact of the multiple-choice format of the test, the researchers argue. They point to some earlier research, in which participants were shown the photographs one at a time without any mental state descriptors, and asked to come up with their own responses. This study found a lot of variability in the answers. "Notably, less than 10% of the participant-generated responses were similar in meaning to the 'correct' responses in the multiple-choice format of the test," Higgins and her colleagues note. Only about 40% of the responses in this 'free' version of the test even matched the valence — either a positive or a negative mental state — as the target response.

There were some limitations to this new work, including the fact that it was based only on data from people without a clinical diagnosis, and did not include autistic participants. However, the results "lend additional support" to the argument that the RMET may not be a valid way of measuring social cognitive ability, the team concludes, which spells possible broader implications for work based on the hundreds of studies that make use of this tool.

Read the paper in full:
Higgins, W. C., Savalei, V., Polito, V., & Ross, R. M. (2025). Reading the Mind in the Eyes Test Scores Demonstrate Poor Structural Properties in Nine Large Non-Clinical Samples. Assessment. https://doi.org/10.1177/10731911251328604

Want the latest in psychological research, straight to your inbox?
Sign up to Research Digest's free weekly newsletter.