Machine learning fails to identify depression based on neurobiology


Researchers have suggested that machine learning—using artificial intelligence to investigate a complex phenomenon—might be better at identifying which neurobiological measures are important and how to use them to predict psychiatric diagnoses. Unfortunately, so far, these attempts have landed with an accuracy little better than pure chance.

However, much previous research has used a single type of neurobiological measure (such as one type of brain scan) to try to predict psychiatric diagnoses.

In a new study, researchers tried something different. They created a machine learning algorithm to combine every conceivable neurobiological measure in order to predict depression.

But their results were just as underwhelming:

“Training and testing a total of 2.4 million ML models, we find accuracies for diagnostic classification between 48.1% and 62.0%,” they write.

For comparison, they note that the social/environmental variables of social support and childhood maltreatment each predict depression with greater than 70% accuracy. Combining social/environmental variables and including more than just those two might bring accuracy even higher.

Caption: This graph from the paper demonstrates that the accuracy of even the strongest machine learning algorithms based on all possible neurobiological information tops out at 62%, while environmental variables reach over 70% accuracy each.
In sum, they write, “Although multivariate neuroimaging markers increase predictive power compared to univariate analyses, single-subject classification—even under conditions of extensive, best-practice Machine Learning optimization in a large, harmonized sample of patients diagnosed using state-of-the-art clinical assessments—does not reach clinically relevant performance.”

The research was led by Nils R. Winter at the University of Münster, Germany. The article was uploaded before peer review to the preprint server On Twitter, Winter wrote about the study:

“The fact that we cannot find meaningful (univariate or multivariate) neurobiological differences on the level of the individual for one of the most prevalent mental disorders should give us pause.”

The previous study by this group resulted in a similar outcome. The researchers found that there were no individual-level differences in neurobiology between people with depression diagnoses and healthy controls.

In that study, the researchers wrote that “healthy and depressive participants are remarkably similar on the group level and virtually indistinguishable on the single-subject level across a comprehensive set of neuroimaging modalities.”
A Deeper Look at the Current Study

The current study by Winter’s group included 1,801 people from three groups: those who currently met the criteria for the depression diagnosis, those who had a history of depression, and healthy controls. They were recruited through the Marburg-Münster Affective Disorders Cohort Study (MACS) in Germany.

Previous studies have raised concerns about the reliability of diagnosis since patients listed as “depressed” are often defined based on screening questionnaires or other less reliable measures. The current study used the clinical standard of a DSM-IV Structured Clinical Interview to diagnose depression, ensuring that the diagnosis was as reliable as possible.

The neurobiological measures included various forms of structural, functional, and task-based MRI, as well as the polygenic risk score (a theoretical measure of complex genetic risk for depression).

The researchers note that there is no accepted theory linking depression to neurobiology:

“Since there is no established formal theory of the neurobiology of depression, it is uncertain which neuroimaging methods will be best suited to capture clinically relevant information.”

Their study confirmed this lack of connection between depression diagnosis and neurobiology since none of the neurobiological measures tested—even when combined—could reach a clinically relevant predictive value.

Thus, they write,

“How biological Precision Psychiatry can deliver more accurate individualized prediction to improve treatment and patient care remains a central open question at this point.”

Yet the social/environmental variables of social support and childhood maltreatment were each individually able to predict depression at greater than 70% accuracy.

The researchers write that the algorithms were slightly better at classifying people with severe, chronic depression who had been hospitalized and were on multiple medications. That is, the algorithms were not very good at identifying people that might be missed by a human clinician, but were slightly better at identifying the very group that is also easiest for humans to diagnose.

But what’s worse is that when the researchers restricted the analysis to just this group—supposedly the easiest for the algorithm to diagnose—it did not substantially increase accuracy:

“Our complementary subgroup analyses focusing on acutely and recurrently depressed patients, respectively, did not increase predictive performance,” they write.

One explanation they propose is that the diagnosis of “depression” is so wide that it doesn’t do a very good job of capturing individual experiences. Thus, they suggest that the depression diagnosis itself does not represent a single “mental illness” linked to neurobiology, but is rather a broad category holding a variety of experiences, mental states, and levels of functioning that vary widely between individuals.

They suggest that there might be a better way to categorize people based on their specific “symptoms” and levels of functioning. However, as they were unable to find such theoretical subcategories, this remains unproven.

Ultimately, they write,

“The complexity of the MDD phenotype might require a more comprehensive approach that incorporates interactions between neurobiology, the entire body, as well as the environment.”



Winter, N. R., Blanke, J., Leenings, R., Ernsting, J., Fisch, L., Sarink, K., . . . & Hahn, T. (2023). A Systematic Evaluation of Machine Learning-based Biomarkers for Major Depressive Disorder across Modalities. doi: (Link)


Editor’s Note: Part of MITUK’s core mission is to present a scientific critique of the existing paradigm of care. Each week we will be republishing Mad in America’s latest blog on the evidence supporting the need for radical change.

Previous articlePaternalism and psychiatry
Next articleHow psychiatry marginalizes those who contradict Western norms
Peter Simons was an academic researcher in psychology. Now, as a science writer, he tries to provide the layperson with a view into the sometimes inscrutable world of psychiatric research. As an editor for blogs and personal stories at Mad in America, he prizes the accounts of those with lived experience of the psychiatric system and shares alternatives to the biomedical model.