Zencos has merged with Executive Information Systems (EIS)
Free Strategy Consultation
Zencos Icon


Contact Us


Text Analytics for Healthcare: From Unstructured Data to Valuable Insights

Healthcare Analytics

Chris St. Jeor



Analytics is essential to the healthcare industry. It provides actionable, data-driven insights for efficient utilization of resources, faster decision-making, and improved patient outcomes. Yet, roughly 80% of medical data is unstructured and remains underutilized (1). Unstructured medical claims data, such as clinical notes or physician’s progress notes contain free text, abbreviations, and medical jargon, making it difficult to analyze and interpret. And leaving potentially valuable insights hidden (1). However, with the help of text mining and natural language processing (NLP) techniques, unstructured medical claims data can be transformed into a structured format, allowing healthcare providers to extract valuable insights resulting in the ability to identify patterns and trends, improve patient care, and cost savings

Text analytics is a powerful tool that can help healthcare providers extract valuable insights from medical claims data. Medical claims data contains a wealth of information about patient care, including demographics, diagnoses, treatments, and medications. However, since a large portion of this data is unstructured and difficult to analyze, it is challenging to identify patterns and trends that could be used to align valuable resources and make data-informed decisions that would improve patient outcomes and result in cost savings.  

Using Text Analytics to Identify Healthcare Patterns & Trends

Unlike manual analysis, text analytics can quickly and efficiently analyze unstructured data to help identify patterns and trends. These insights can help healthcare providers make data-driven decisions, including identifying best practices, developing treatment protocols, and anticipating potential problems. All leading to improved patient outcomes. Analyzing data from a cohort of patients with positive outcomes can provide insights into commonalities across care. Providers can then use this information to identify best practices and develop treatment protocols. For example, providers can utilize unstructured patient data in conjunction with demographic data to determine if there are regional variations within the data. Certain procedures or diagnoses may be more prevalent in some geographic areas due to environmental or demographic factors. This can inform decision-making and treatment protocols at the regional level.  

Text Mining As a Step in Patient Care Improvement

Identifying patterns and trends in patient care with text mining electronic medical records (EMR), in conjunction with medical claims data, prescription data, and image data, can be useful in identifying various gaps in patient care, such as non-compliance with treatment plans or missed or delayed diagnoses. Understanding where the holes are allows for better decision-making by healthcare providers to improve patient care, reduce gaps in care, and minimize adverse events. In addition, healthcare providers can utilize insights from text analytics to develop more effective treatment plans and reduce the risk of missed or delayed diagnoses. 

By utilizing text analytics healthcare providers can more efficiently identify patients at risk for certain conditions or those who may benefit from additional follow-up care. Identifying adverse effects in high-risk patients and creating personalized disease risk predictions can result in improved patient care (3). A more in-depth understanding of the data can allow for earlier interventions, more targeted treatment, or prevention programs. Injuries due to falls are the most common adverse event reported in hospitals, with almost 1 million patient falls occurring in US hospitals each year, resulting in 250,000 injuries and roughly 11,000 deaths (7). This is a common and costly source of preventable injury, and by utilizing text mining of unstructured text-based EMR data one research team identified additional information that was not originally coded in the administrative data but may be relevant to the fall-related injury, with their goal being to validate and enhance administrative data and identify factors to reduce falls (3). 

How Data Analysis Reduces Inefficiencies & Costs

Our healthcare system is a complicated web of millions of decisions made each day, which is bound to lead to some inefficiencies and redundancies. By using text mining to analyze these decisions across a wide spectrum of patients and caregivers, we can find areas that can be improved to save money and time and enhance patient outcomes. Text mining has the potential to be used to predict hospital readmission, providing insights into steps the healthcare team can take to reduce or eliminate a patient’s need to be readmitted (2). For example, in one retrospective analysis using EMR text mining was utilized to identify patients with key psychosocial factors predictive of 30-day hospital readmissions (7). The researchers appended these text-mined psychosocial factors to the LACE (Length of stay, Acuity of the admission, Comorbidity of the patient, and Emergency department use) Index for Readmission to improve the prediction of readmission risk. By adding these text-mined factors to the LACE Index, the area under the receiver operating characteristic curve (AUROC) of the readmission prediction improved by 8.46% for geriatric patients, 6.99% for the general hospital population, and 6.64% for frequent admitters (7). This study demonstrates the value of utilizing text mined psychosocial data from EMR clinical notes, and incorporating this into predictive modeling, can improve accuracy of readmission risk prediction and improve patient outcomes. Improving risk prediction by incorporating text-mined data can in turn reduce the overall cost of healthcare while improving the quality of care provided to patients.

Text Analytics & Diabetes Treatment: A Powerful Example 

When used by healthcare providers, text analytics is a powerful tool that leads to valuable insights. For example, diabetes is one of Chronic Kidney Disease’s (CKD) two most common causes (4). Patients with CKD experience poor outcomes and high costs. Medicare patients with CKD have an annual mortality rate twice that of Medicare patients without the disease. Almost 90% of patients with CKD progress to End Stage Renal Disease, which requires hemodialysis and costs Medicare over $80K/patient annually (5). 

A healthcare provider interested in improving patient outcomes for patients with diabetes can use text analytics to extract insights from a large cohort of patients with diabetes. First, they can use text mining and NLP algorithms to extract relevant information from the medical claims data, such as patient demographics, diagnosis and treatment codes, pharmaceutical data, comorbidity data, physician’s notes, and patient outcome data. By analyzing data from a large cohort of patients, the provider can identify commonalities in the care provided to those with better control over their diabetes and fewer complications. For example, they could identify patients with a specific treatment protocol in combination with a specific medication who exhibit greater medication adherence. The healthcare provider can utilize these insights to monitor patient outcomes, develop a more effective treatment protocol for patients with diabetes, and make data-driven, efficient adjustments to the protocol. This can result in improved patient outcomes and reduce costs by ensuring patients are managing their diabetes effectively and prevent Chronic Kidney Disease (CKD).  Utilizing these insights and making data-driven decisions can drastically improve patient outcomes and result in major cost savings for both the patient and the healthcare industry. 

By accessing the highly underutilized unstructured medical claims data, healthcare providers can analyze patterns and trends in care and identify areas for improvement resulting in improved patient outcomes and reduced costs. By taking control of their patient data, using reliable data sources, and utilizing analytic tools like text analytics, healthcare providers can transform their operations by making data-driven decisions, reducing costs, and improving the lives of their patients. 


  1. Kong HJ. Managing Unstructured Big Data in Healthcare System. Healthc Inform Res. 2019 Jan;25(1):1-2. doi: 10.4258/hir.2019.25.1.1. Epub 2019 Jan 31. PMID: 30788175; PMCID: PMC6372467.
  2. Warchol SJ, Monestime JP, Mayer RW, Chien WW. Strategies to Reduce Hospital Readmission Rates in a Non-Medicaid-Expansion State. Perspect Health Inf Manag. 2019 Jul 1;16(Summer):1a. PMID: 31423116; PMCID: PMC6669363.
  3. Islam MS, Hasan MM, Wang X, Germack HD, Noor-E-Alam M. A Systematic Review on Healthcare Analytics: Application and Theoretical Perspective of Data Mining. Healthcare (Basel). 2018 May 23;6(2):54. doi: 10.3390/healthcare6020054. PMID: 29882866; PMCID: PMC6023432.
  4. Mayo Clinic. (n.d.). Chronic kidney disease. Retrieved March 3, 2023.
  5. Liu, Harry H.1; Zhao, Sophia2. Savings Opportunity from Improved CKD Care Management. Journal of the American Society of Nephrology 29(11):p 2612-2615, November 2018. | DOI: 10.1681/ASN.2017121276
  6. Goh KH, Wang L, Yeow AYK, Ding YY, Au LSY, Poh HMN, Li K, Yeow JJL, Tan GYH. Prediction of Readmission in Geriatric Patients From Clinical Notes: Retrospective Text Mining Study. J Med Internet Res. 2021 Oct 19;23(10):e26486. doi: 10.2196/26486. PMID: 34665149; PMCID: PMC8564665.
  7. LeLaurin JH, Shorr RI. Preventing Falls in Hospitalized Patients: State of the Science. Clin Geriatr Med. 2019 May;35(2):273-283. doi: 10.1016/j.cger.2019.01.007. Epub 2019 Mar 1. PMID: 30929888; PMCID: PMC6446937.

Related Insights


How Survival Analytics Provides a Lifeline for Hospitals Combating Nursing Turnover


Top 5 Use Cases of Analytics in Healthcare


Improve Healthcare Operations and Patient Outcomes Through Effective Data Analytics