EHRNoteRetrieval
Medical data is highly fragmented and unstructured, with clinicians spending over half of their working hours on documentation. Clinical notes make up a substantial portion of this data, and structuring them has been a longstanding, unresolved challenge.
Structuring clinical notes requires deep medical knowledge and significant clinical experience, given the vast use of abbreviations, subtleties, and inconsistencies in terminology. Additionally, normalization and resolution methods can vary widely among medical professionals. Our initial focus is on structuring discharge summaries using the MIMIC-IV dataset, with plans to extend this approach to other clinical notes and datasets. Success in this effort could unlock immense opportunities across numerous research fields, including clinical and AI-based studies. For instance, structured clinical notes could support the generation of large QA datasets and pave the way for truly multimodal AI applications in medicine.
Our team’s previous research includes EHRNoteQA, an LLM benchmark made out of LLM and clinical evaulations, have been a foundation of this research.