Healing by the Numbers: MySQL Dive into Hospital Efficiency

When my mom had her knee replacement surgery a few months ago, she got excellent care during her time in the hospital.

And after seeing what hospital care is like from the patient and patient advocate’s perspective, I was curious what it looks like from the hospital administration’s perspective. Running a hospital has to be an enormous logistical challenge.

And it got me thinking about how data analysis could help hospitals solve problems like these:

🤔 You can’t know for certain how many patients you’ll have at any one time, but you have to have beds, equipment, and staff available for unexpected emergencies.

🤔 Each patient requires lab work, treatments, and round-the-clock care from a variety of departments and specialties.

🤔 You have to ensure that care is equitable for people of all ages and races.

🤔 You have to keep accurate statistics on literally everything that happens — admissions, procedures, labs, insurance, and more.

🤔 And then there’s the financial aspect. You have to ensure that patients receive the care they need while keeping the hospital afloat financially.

In order to get a clear picture, I decided to focus in on one specific diagnosis. I honed in on diabetes, the 7th highest cause of death in the US.

The UC Irvine Machine Learning Repository provides a thorough data set covering ten years of clinical care at 130 US hospitals that focuses on people hospitalized for diabetes.

Here’s a quick summary of the data:

Total procedures: 136339

Total number of patients: 71518

Average time in hospital: 4.4 days

Distinct Medical Specialties: 73

Using data to make sure beds are always available

During 2020, we saw what happens when hospitals have more patients than beds. And while that situation is rare, hospitals need to use data to create systems to make sure they have beds available when people need them.

Since hospitals have a limited number of beds, it’s important to make sure patients don’t stay any longer than medically necessary. So I wanted to get an idea of how long patients are staying in the hospital.

For this calculation, I wanted to create a visual representation that a hospital administrator could evaluate at a glance. Since SQL is not a data visualization tool, I had to outsmart it a bit. So I used one * to represent every 200 patients and created a histogram using the following SQL code:

This resulted in a histogram that effectively communicates how long patients stay in the hospital. While it’s not fancy, this easy-to-read visualization paints a quick and accurate picture.

Using Data to Track Procedures

For both hospital efficiency and the well-being of the patients, it’s important that patients receive only the necessary procedures.

If I were a hospital administrator, I’d be curious to see which of the 73 specialties tended to perform the most procedures. So I wrote the following SQL code to figure out the average number of procedures for the top 10 specialties:

This resulted in the following list that ranged from 2.53 to 4 procedures per patient per hospital stay.

But if I was thinking like a hospital director, I’d want to hone in on the specialties that are performing the most procedures overall. This way we could make sure that they’re focusing on only what is necessary for patient welfare.

I realized it’s possible that the above data includes specialties with high per-patient averages that don’t see very many patients. 

It would make more sense to drill the data down to the specialties with high patient numbers and a high procedure count. By helping these departments reduce their numbers, it could make a big difference in hospital efficiency and cut down on procedures where the risk-to-benefit ratio is high.

So I added a column listing the actual count of procedures for each specialty and I filtered the results based on specialties that performed over 50 procedures and had an average of above 2.5. This would provide a solid list of specialties to focus on.

Checking to See if Race is a Factor

Over the past couple years, I’ve been hearing about how a patient’s race can influence the medical care they receive. So I wanted to look at this data for any obvious racial bias. Since all patients require lab testing, this seemed like it would be a good indicator.

Did the number of lab tests vary between patients depending on their race? 

I wanted to see the race, the average number of procedures, and the total procedures. So I wrote the following SQL code::

When grouped by race, the averages ranged from 40.86 to 44.08 procedures. So looking at the lab data, I don’t see an obvious gap based on race

How length of stay correlates to number of lab procedures

It seems logical that the longer someone spends in the hospital, the more labs they’ll have. But I wanted to check the data and make sure. So I wrote the following SQL code:

The results turned out exactly as I would have expected.

Hospital Success Stories

The business aspect of the hospital is important — it can’t stay open without it. But the dedicated healthcare professionals are there to make a difference.

So I thought it would be good to see what the data could show me about patient success stories.

To figure this out, I looked for patients who came in with an emergency, but were able to go home in fewer days than the average stay. I did this by writing a subquery in my SQL code:

This query returned 33,684 results. So there were 33,684 emergency treatment success stories. I was curious about the total number of emergency cases so I did a COUNT and found out that there were 53,990. So about 62% of emergency cases were discharged in less time than the average stay.

Using CONCAT to Present Data

A hospital administrator will want to be able to access and understand the data to inform decisions and measure the impact the hospital is making in the community.

Of course visualizations are great for this. But what if she wanted specific info on each patient in a written format? This line of thinking led to using the CONCAT function to create a summary for each individual patient.

The following SQL code uses the data in two tables (health and demographics) to create a patient summary in sentence form.

This SQL query returned the following results:

Analysis Summary

This analysis just scratches the surface of the treasure trove of information available in this data set. But spending some time using MySQL for analysis gave me some interesting insights:

✅ Patient hospital stays vary, but many patients are there for 3 days or less.

✅ The specialty that performed the most procedures per patient was Thoracic Surgery with an average of 3.5 procedures per patient.

✅ Based on the average number of lab procedures per patient, race does not appear to be a factor in care.

✅ As anticipated, the number of lab procedures correlates to the length of stay.

✅ 62% of emergency cases were discharged in less time than the average stay.

Let’s Connect

If you have questions, would like to talk data, or have an opportunity you think I’d be a fit for, I’d love to chat.

I’ve been spending lot of time digging into data. If you’d like to see a business-related data analytics project I did in Excel, check out this one.

Previous
Previous

My Audiobook Obsession SQL Project

Next
Next

SQL Analysis of World Bank's Battle Against Poverty