What is Medical Data Labelling?
One of the most important inputs to AI product development is Data Labelling. Whether it is a self-driving car that learns to brake or an AI Medical Device that alerts clinicians when there is a risk of stroke; every supervised learning algorithm has humans in the background who have labeled the data that trains these models. Medical Data Labelling is a specialized area that is increasingly becoming fundamental to Medical AI projects.
“Every supervised learning algorithm has humans in the background who have labelled the data to train models”
Data Labelling is a vast field covering every kind of data e.g. text, audio, or images. Within the medical domain, a growing area is medical image analysis which requires labelling of images requiring radiological or anatomical expertise such as X-Rays, MRIs, or CTs. The process entails experts isolating pixels in images and categorizing these areas depending on the precise use case.
“Doing so could save radiologists time, reduce human errors and save or improve the quality of life”
For instance, labelling a Brain Tumour in an MRI will help in identifying where exactly the tumour site is, what its dimensions are and what structures are shown in the scan. These labels can then be fed into machine learning algorithms that can detect such findings in other scans. Doing so could save radiologists time, reduce human errors and in more complex cases, save or improve the quality of life through faster diagnosis.
Why do we need Medical Data Labelling?
Outsourcing medical data labelling alleviates significant pain points in medical image analysis. Companies which start their AI projects with data that is already labelled can spend more resources on developing the algorithm. Labelling medical data requires highly specialised skill sets; certain use-cases even require neuro-radiologists to label images. Hiring internal resources to acquire this expertise can be prohibitively expensive or slow down product development.
“In these use-cases, highly accurate segmentation of anatomical structures is very important”
Medical Data Labelling is most critical for 3 types of AI Projects:
- AI models for diagnosing medical conditions – developing software that can detect tumours, stroke, or other conditions by reading medical images such as MRIs or CTs. More sophisticated algorithms quantify severity or provide analytics supporting decision-making. Here the accurate labelling of the parts of the image which show the medical conditions is critical (see below).
- AI models for early Indicators of health problems – developing software that will identify anatomical problems which predict the likelihood of medical conditions in the future. For example, a patient may undergo a CT scan for another reason but there may be incidental findings that point to a risk of a heart or brain problem in the future. In these use cases, highly accurate segmentation of anatomical structures is very important.
- Surgical Planning Labelled medical data can be used to build 3D models for surgical planning and can serve as the medium for augmented reality during surgical procedures. This in turn helps surgeons save time, have better visuals and above all make surgical procedures minimally invasive as surgeons don’t have to cut deeper for better visuals. Outcomes of surgical interventions can also be simulated using computer models, improving the success rate of surgeries.
“There are currently no tools that can fully automate labelling of all anatomical structures”
Numerous software packages exist for labelling including many free and open-source tools such as 3D Slicer and ITK Snap. Tools allow fully manual or semi-automated labelling where only a few pixels need to be labelled and the labels grow to cover similar regions. There are currently no tools that can fully automate labelling of all anatomical structures because of the huge variation in terms of modalities, medical devices, patient anatomy and image acquisition method. This means that the labelling process requires a large amount of human input and manual effort.
For the labelling process, raw medical data such as X-Rays, CT, MRI, Ultrasound, and other modalities are used which is typically structured in DICOM, NIFTI, or video format. Once the labelling process is completed, the result can be exported in various file formats e.g. Nifti, nrrd which can be used to build machine learning pipelines.
What are the challenges in Medical Data Labelling?
Unlike data labelling for other domains such as self-driving cars where anyone with good attention to detail and basic computer skills can perform the task, medical data labelling requires a very special skill set. A contrasting combination of up-to-date medical expertise along with labelling software-related knowledge is required to be effective.
“Companies struggle to develop in house labelling teams”
Some companies struggle to develop in-house labelling teams as medical images are quite complex and each use case quite specialised. Unlike most other datasets for AI, some medical images have 3 dimensions around the x, y, and z planes (sagittal, coronal & axial). One MRI or CT scan can have 100s of images (or slices), so labelling data for as much as one patient can itself be quite a complex process.
“If there is an error in the labelling process, it can lead to inaccurate training of the AI model”
Medical professionals are usually trained to focus on specific areas of a scan rather than focusing on every single pixel but in the case of labelling, the greater pixel-wise focus is required. If there is an error in the labelling process, it can lead to inaccurate training of the AI model, and consequently a failure to achieve better clinical outcomes.
Who is working on it?
There are many companies performing Data Labelling but few specialize in medical images due to the highly specialized skill set, network, and infrastructure that is required, increasing the barriers to entry. Given the high need for accuracy and quality, some companies are reluctant to use data labelling companies that do not have specialist medical knowledge. Labelata, a Switzerland-based company, which specializes in Medical Data has had a lot of success with clients due to its specialization and medical expertise.
What sets Labelata apart in performing Medical Data Labelling?
“Highly specialised datasets can be curated at scale”
- Specialisation in Medicine – Unlike other more general data labelling companies, Labelata specializes in medical images, having sub-teams specialising in different parts of the body and imaging modalities (e.g. CT, MRIs). This ensures that highly specialised datasets can be curated at scale.
- Network of Experts – Labelata’s network of experts has the unique skillset of being trained on segmentation tools as well as having very strong anatomy and radiology knowledge.
- Investment in Training – Labelata invests a lot of time in training its staff to ensure they can deliver high-quality segmentations.
- Robust Quality Assurance – Labelata has a very strict assessment and quality assurance process to ensure clients’ requirements are met.
In a short space of time, Labelata has been able to build trust and perform segmentation faster than in-house teams. Clients like working with Labelata as it helps them kick-start or push forward projects without sacrificing quality.
Previous publication (Linkedin Post): https://www.linkedin.com/feed/update/urn:li:activity:6781253257940484098
This article was co-authored by Taskeen Fatima, one of Labelata’s segmentation experts