Tell us how you came to be the CEO at Datature.
My co-founder and I started Datature in 2019 because we realize the lack of structure throughout the development of machine learning capabilities in many companies causes teams to move slowly. This problem is especially aggravating for folks working with computer vision – where infrastructure, labelling, and deployment come into the picture.
The best team works collaboratively. However, the existing ML development process excludes domain experts and stakeholders with a technical barrier of entry that high. We wanted a way to condense the industry’s best practices into an end-to-end platform and present it in a form that democratizes access to these complex machine learning processes and algorithms. Since then, I started working on Datature full-time to build an MLOps platform, hoping to streamline the way industrial companies, research, and startups develop their computer vision capabilities.
What are some of the common challenges customers approach Datature with?
We see two distinct groups of customers that approach us. Firstly, we have users such as startups, research labs, and companies who just started their foray into machine learning. They have perhaps done some initial groundwork but face challenges transforming raw data into machine learning models or APIs. The main issue for them is getting started.
These challenges are usually caused by difficulties in setting up the entire pipeline, lack of experience using ML tools, and/or having no technical people on board.
Secondly, we have seasoned users who are part of machine learning teams, developed early prototypes of their products, but face issues with repeatability of the process.
These customers often wish to improve existing processes or ensuring that laborious tasks such as annotating data and evaluating models can be done collaboratively to save time and reduce data silos.
Can you share emerging trends seen in MLOPS?
More AI leaders today have a better understanding of MLOps lifecycle and its importance. An emerging trend we see today is that there are exponentially more conversations from engineers and intent from leadership around adopting proper MLOps lifecycle practices within organizations.
This is a huge win for AI practitioners around the world as the scene collectively acknowledges the issues and limitations around dated ML engineering practices. On the technical front, the MLOps community has emphasized the importance of detecting and overcoming model drifts and ‘fairness’. It remains that there is no catch-all approach to overcome this – and we have been noticing an explosion of new tools, papers, and discussion of tackling this as part of the MLOps lifecycle. We believe that this is still an early area within MLOps practices and an important one that even we, at Datature, are trying to address as part of our platform.
Where do you see the biggest areas of improvement for data annotations & labeling?
Great question! I’ve found that over the past few years, outsourcing data labelling tasks to external labelling companies like Scale, LabelBox, Mechanical Turk, etc has become popular. However, even with a market that is saturated, this method for addressing humongous datasets remains inaccessible to many teams.
With issues surrounding data privacy (and accountability), costs of labour, and general difficulty for the annotation workforce to understand complex domain-specific tasks (e.g. identifying material defects amongst 80 different classes or highlighting complex lesions in medical imageries) – not everyone have been able to leverage this external workforce.
We believe that model-assisted labelling and AI-guided annotations will significantly improve this entire process for everyone. It enables domain experts to label data at breakneck speeds and for teams to perform labelling 10-20x faster, internally.
At Datature, we even have our own AI-guided tool – IntelliBrush, that uses a combination of deep-learning and image analysis to help users draw complex masks with just a few clicks!
What are some of the distinctive features of the Datature that differentiates you from your competitors?
To begin with, Datature covers the entire model building process, from end to end. We aim to be an abstraction layer between raw data and machine learning APIs. Our platform allows users to curate datasets, annotate images, generate augmentations, train models and deploy them as API – all without the need to code. Our first goal is to immediately democratizes access to machine learning for non-ML engineers, researchers, and companies.
Additionally, most teams spend 70-80% of their time labelling image data. As mentioned above, our AI-guided brush, IntelliBrush, is changing the game for users looking to obtain labelled data quickly. Datature IntelliBrush is one of the fastest and most accurate labelling tools in the scene and the best part of it all? It’s industry-agnostic and works right out of the box.
We use a workflow editor to help users express their intended data pipeline and output model architectures. As opposed to AutoML-based solutions, we believe that experienced developers can benefit from the convenience and retain configuration freedom this way.
Finally, we believe in making model inspection much more streamlined. We released an open-source tool, Portal, that helps developers and stakeholders run through scenarios, test thresholds, and inspect performance without wrangling codes. This has helped our users showcase performance to clients and stakeholders – some of them even ship their products with Portal! At Datature, we believe it is our responsibility to bring the user end-to-end, from raw data to model deployments. Our key differentiator is that we believe in helping users build the process, and not just the model!
What breakthroughs in the AI/ML space are you most looking forward to from a technology perspective?
I’m particularly excited about breakthroughs in the realm of image, audio and text synthesis with deep-learning methods. I can’t help but be impressed with the level of creativity and ingenuity when it comes to projects people are building with GANs. From technical purposes such as creating synthetic images, writing codes (GitHub Copilot) to artistic purposes such as generating 3D models for games based on images – this vertical of study within ML never cease to amaze!
What are you learning right now?
I’m currently dabbling around Streamlit, trying to make some of my older experiments come to life with fancy sliders and aesthetics.
Do you have some final thoughts?
MLOps is still in its nascent stages and what’s important is for teams to share their experiences, pain points, and thoughts on the future of these current sets of practices. Akin to DevOps, it is never too early to get your team and company on the right track with MLOps tools and practices.
P.S, for any AITechTrend readers who’d like early access to IntelliBrush (this is the first interview we have mentioned this new tool), feel free to email me at firstname.lastname@example.org!
Keechin Short Bio
Keechin Goh is the Co-Founder and CEO of Datature where he is developing the core platform, Nexus, and other open-source projects such as Portal that enables teams to robustly build breakthrough AI projects from end-to-end. Passionate about building #NoCode products that make complex processes accessible – Keechin won multiple hackathons doing so, including the annual Rolls-Royce Data Challenge. He is a 3x back-to-back hackathon champion and has launched two products that are both subsequently featured on ProductHunt.
Datature is an end-to-end MLOps platform that helps companies and researchers build highly robust computer vision models. Datature’s platform provides a suite of tools from AI-assisted labelling, workflow editor, GPU orchestration, model transfer learning, and managed deployment to help developers and researchers develop ML capabilities and pipelines 10x faster.
Datature’s mission is to democratize access to complex machine-learning setups and processes so that developers can focus on building breakthrough projects on a leveled playing field.