Extracting Semi-Structured Data Using LLMs

A successful collaboration with smartHEALTH EDIH

smartHEALTH EDIH

For Ergobyte and Galen Medicines, access to accurate and structured pharmaceutical information is critical. To keep Galen comprehensive and up to date, Ergobyte must continuously process vast amounts of complex data, much of which is buried in PDF documents. Recognizing the limitations of traditional methods, Ergobyte partnered with the eHealth Lab of INAB CERTH under the smartHEALTH EDIH initiative to explore potential solutions.

The Challenge

Ergobyte has created and continuously updates Galen Medicines, the most popular, comprehensive, online reference for all matters regarding human pharmacology. The service is highly valued by health professionals, scientists, researchers and patients in Greece and Cyprus.

One of the most essential sources of information for Galen Medicines is the Summary of Product Characteristics (SPCs), the document that describes the properties (composition, indications, contraindications, side effects, pharmacodynamics, and pharmacokinetics) and officially approved conditions of use of each medicine. Published by the European Medicines Agency (EMA), the SPC is only available in PDF format, which makes it extremely difficult to process or extract data from.

Ergobyte addressed the problem to the eHealth Lab of INAB CERTH, in the context of smartHEALTH EDIH. The lab undertook the exploration of innovative approaches based on AI and large language models (LLMs) to effectively extract semi-structured data from SPC documents.

The Solution

The researchers of INAB investigated a number of approaches, such as natural language processing (NLP) and large language models (LLMs). Their efforts resulted in the design and development of a functional prototype that allows end users to upload screenshots from SPC PDFs and receive the contained information in structured form. The tool leverages LLM’s API to handle communication with the model and return the processed data.

The prototype accelerates Ergobyte experts’ effort to integrate data into Galen Medicine's knowledge base. Thanks to smartHEALTH and eHealth Lab’s work, Ergobyte gained access to advanced know-how and cutting-edge AI technologies. The overall project enhanced its expertise and laid the foundations for the technological advancement of the company’s products and services.

About smartHEALTH European Digital Innovation Hub

The European Digital Innovation Hub for Smart Health: Precision Medicine and Innovative E-health Services (smartHEALTH) is a one-stop-shop where SMEs, startups, mid-caps, and the public sector can get help to improve business and production processes, products and services by means of digital technology. The hub offers expertise in the areas of precision medicine, cancer, medical image analysis, public sector digital transformation and infrastructure.

espa

Contact
21, Aristotelous str., 54624
Thessaloniki, Greece
Tel. (+30) 2310 288434
E-mail: info@ergobyte.gr
C.R.N.: 59258404000
ISO 27001