Achieving Better Health Outcomes with Accurate, Focused Analysis of Social Determinants of Health Data

Incorporating Social Determinants of Health Data in Practice

Setting the Stage

Clinical data has long driven decision-making for healthcare providers and insurers, but it tells only part of the story. Healthcare leaders who utilize external data regarding social determinants of health (SDOH) better understand the economic and environmental factors driving patient risk, can better predict health outcomes, and offer more individualized care.

However, getting that data and finding a way to understand and utilize it has been a struggle. Most publicly available datasets have been too generalized to support productive analysis for a particular healthcare environment. A network of research, government, and commercial entities is now making more refined SDOH datasets available, and the potential for more effective interventions is growing because of it.

Comprehensive data and analytical tools that utilize SDOH data can better predict health outcomes, giving healthcare providers the opportunity to proactively address risk factors, improve outcomes for patients, and lower costs by utilizing holistic preventive care and treatment plans.

Data Analytics
Contact Us

The Problem

When healthcare leaders can draw on actionable insight derived from robust analytics, outcomes improve and treatment becomes more cost effective. However, few organizations have the pieces in place to achieve the analysis that makes the difference for patients.

SDOH datasets offer tremendous value, but many healthcare provider networks and payers are not yet able to fully utilize this resource. Most organizations that have integrated SDOH analysis rely on publicly available datasets that do not provide adequate granularity for rich analysis.

Many of the most illuminating datasets for organizations within the healthcare space can be identified and obtained only through experience and relationships within the SDOH ecosystem. Even if you know where to look, drawing value from these datasets depends on developing the advanced data science capabilities to reliably predict health issues— and work toward improved outcomes.

The Approach

For our health system client, breaking through from problem to improved outcomes depended on an approach that utilized our partnerships within healthcare and research organizations to align clinical expertise and advanced analytical capabilities. The relationships we’ve built with government, nonprofit, and private organizations enable us to acquire more refined data that isn’t easily attainable and facilitate richer analysis.

Our experience tailoring proven reusable predictive machine learning models to specific health outcomes and our field-tested approaches reduce time to value and create specialized, accurate outputs.

Delivering the insights that our partnerships and analytical tools provide is a critical element of any project. We rely on a microservices approach to produce a solution that works within or across the client’s existing electronic medical record (EMR) system, enterprise data warehouse (EDW), or other systems.

Using SDOH to Reduce Infant Mortality

According to the CDC, the United States has an infant mortality rate of 5.7 deaths within the first year per 1,000 live births, ranking it 33rd out of the 36 member countries of the Organization for Economic Cooperation and Development (OECD). And within the United States, Indiana has historically had a high infant mortality rate—6.8 deaths within the first year per 1,000 live births in 2018. Our partnership with an Indiana healthcare system leveraged SDOH data to understand and address an infant mortality rate above the state average.

Identifying Social Features

From more than 60 possible social and clinical determinants of health, we worked with the client to outline 51 clinical and social features likely to contribute to infant mortality. We organized these features into six categories: socioeconomic conditions, education, maternal health, environmental factors, infant health, and accessibility. These features included individual patient information and aggregates for census tract, census block, ZIP code, and county of residence, depending on the dataset.

Mapping Social to Clinical Features

We mapped these potential factors to clinical data across the client’s electronic medical record (EMR) system, enterprise data warehouse (EDW), and nine carefully selected external datasets chosen for coverage of the relevant counties and ZIP codes, age of data, and number of features per data source.

A critical success factor was the ability to leverage relationships across the public, private, and government spaces at the local and national levels to secure more granular social determinant datasets. We partnered with two local nonprofit research organizations to help our client obtain census tract and block level data that illuminates trends in food insecurity and access to healthcare—refinement that provides much more value than data at the county or ZIP code level.

Social and Clinical Determinants of Health Data Mind Map

Isolating Clinical Factors

Our data scientists created a survival analysis model and three submodels to identify leading clinical and social factors driving infant mortality. Survival analysis is a powerful technique used for modeling and analyzing the expected period of time before one or more events happen. Segmenting the survival rate enabled us to test how it changes across groups and to estimate the survival rate for changes in features such as prenatal visits and economic class.

Through different iterations of model building, we worked with our client to isolate the clinical factors most correlated with infant mortality within the first year after childbirth. Among those clinical factors were maternal hypertension, whether the infant’s delivery was preterm, and the number of prenatal visits a mother attended.

Infant Mortality Risk Factors by Level of Granularity

Quantifying Statistical Impact

After identifying the most correlated clinical features in our initial survival analysis model, we built three logistic regression sub-models to identify which social features were driving three of the clinical features— hypertension, preterm birth, and number of prenatal visits. By using logistic regression models, we quantified the statistical impact of an SDOH feature on the likelihood of a clinical feature being present. For instance, premature birth had the highest impact on infant death. And every 1% increase in the unemployment rate in a mother’s community increases her likelihood of having a preterm delivery by 1.2%.

The Outcome: Tailored Treatment Plans

Understanding the social factors correlated with preterm delivery, among other clinical features, practitioners can better tailor treatment plans to include wraparound services and interventions that go beyond just treating the clinical conditions.

Turning Data into Better Outcomes for Congestive Heart Failure Patients

Our SDOH and infant mortality project is repeatable and applicable to a wide range of health conditions. Congestive heart failure (CHF) is one of the most common reasons for hospitalization of U.S. citizens over age 65 and a high priority for health leaders. According to the CDC, CHF cost the nation an estimated $30.7 billion in 2012, including the cost of healthcare services, medicines to treat heart failure, and missed days of work. The CDC also reported that, in 2018, heart failure was mentioned on 379,800 death certificates (13.4%).

Predicting a Diagnosis

Working with a health system client, we identified possible social and clinical determinants thought to be highly correlated with CHF, mapped those to potential data sources, and created two machine learning models targeting likelihood of initial diagnosis and likelihood of readmission following an initial diagnosis. Different iterations of model building showed a logistic regression model with strict outlier treatment effectively predicted a patient’s primary diagnosis of CHF, delivering a 93.52% accuracy rate and a 94.31% precision rate. The model not only predicted a CHF diagnosis but identified the top ten clinical and social features driving a patient’s diagnosis.

We built a second logistic regression model using weighted recurring visits to predict the likelihood of a recurrent visit to the hospital following a patient’s diagnosis with CHF, which resulted in an accuracy of 73.0% and a recall of 99.0%. (Recall is the number of true positives predicted by the model, divided by true positives and false negatives.)

The first machine learning model revealed that the features most correlated with CHF include chronic kidney diagnosis, COPD diagnosis, behavioral health, substance abuse, and financial class. In the second model, the highest risk factors associated with CHF diagnosis with a recurrent visit were obesity, followed by behavioral health and financial class, respectively.

Tailoring Programs to Risk Factors

Because we built a logistic regression model, we can see which factors positively impact a patient’s likelihood of a CHF diagnosis and the features that drive down that likelihood. Whether the patient spoke English well or came from a rural community were among the social features with negative correlation to readmission. One of the key challenges identified in the model-building process was a need for the client to better identify where recurring visits are happening and to understand the length of stay, discharge disposition, and insurance.

Correlating social features with diagnoses and recurring hospital visits provides practitioners insight to better tailor treatment and outreach plans.

Moving from Data to Action

Even the most enlightening analyses provide negligible impact if the data does not reach the right people in the right format so that it can drive decision-making. In the infant mortality use case, our machine learning model provided the client with a survival rate score, confidence index, and the high-risk clinical and social determinant features driving those scores.

For each patient, our models provide a listing of the clinical and social features most correlated with a patient’s risk of adverse health outcomes. This provides actionable information that can be integrated into an existing EMR solution or displayed through external reports and dashboards.

Sample Health Practitioner Infant Mortality Data Dashboard View Available to Users


Drawing from SDOH datasets that correlate to the health conditions of primary concern, healthcare organizations can more accurately predict patient risk and significantly improve outcomes with efficient interventions. Our predictive machine learning models drive improved patient outcomes through tools like these:

  • Real-time dashboards frontline clinicians can use during patient intake or examination. Insight into societal factors driving health risks places clinicians in a better position to provide more holistic treatment and care plans.
  • CRM (customer relationship management) systems and marketing systems enabling healthcare payers to identify their most at-risk members with tailored outreach campaigns. Preventive healthcare efforts lead to less cost for the healthcare payer in the long run.
  • EMR charts available to healthcare practitioners during maternal and infant health screenings, as well as prenatal and postnatal visits, to better inform practitioners of the factors that could place the infant at risk.
  • Case management systems enabling care coordinators in accountable care organizations (ACOs) to use data-driven insights to prioritize their case load and leverage more targeted and effective interventions.

These capabilities have moved the needle across a variety of infant and maternal health and CHF initiatives.

  • Incorporating this tool into an OB Navigator program more accurately and efficiently identified high-risk mothers and paired them with a nurse for a two-year period, providing the paired nurse with significant information about the individual to empower more tailored interventions.
  • Using a similar approach, SDOH analysis enabled a health system to identify high-risk mothers and partner with a third-party nonprofit to provide life coaches and nurses via televisits.
  • Leveraging data from these models and working with their state’s Women, Infants, and Children program, another client identified individuals at a high risk of CHF to create targeted outreach and enrollment.

Comprehensive analytical tools that utilize social determinants of health datasets bring undeniable impact—to patients and to the health systems and payers who support them.