Cost-Efficient, Scalable & Accurate Annotations
For many problems, we need to formulate a ground truth, or dataset which describes what the right answers are. This ground truth is in turn used to train your models and evaluate their effectiveness. Often, the ground truth is so important to our clients, that they see it as the moat around their business, so nurturing and guiding its growth is essential. When we help our customers build their ground truth, we do so in a highly prescribed and intentional manner. You can read about our annotation gathering service here to gain a better understanding of the problem.
Often, once we've imported your data into our database by writing a custom transformer for your problem, most if not all, of the data is not annotated. For example, we might start off by importing millions of your documents, images, videos or business data but we won't know exactly which labels or annotations you wish to find from the content. Making your AI/ML models perform well under many different scenarios involves us getting annotations, often lots of data, representative of these scenarios. Our typical approach is to first define a set of seed annotations, usually after a number of high level business discussions with you. We then usually expose our Mayetrix Annotator tool to some trusted users to provide annotations through a simple web UI. Our tool is able to flexibly and quickly search through the potentially large data set because we've indexed the data into an elastic, highly scalable search system. Once we gather enough annotations from our trusted humans, we can now begin leveraging less or even untrusted crowds to generate labels for us.
You may, for example, have a natural UI integration point, such as in your mobile or web application; in this case, we might leverage the Mayetrix Annotation APIs so annotations can be provided by your users in the context of your application. Because your trusted humans might provide a highly trusted gold standard, we are able to determine how to most effectively map untrusted crowd output to a trusted ground truth.
Finally, we usually leverage models themselves, to guide the human labelling process. The Mayetrix Annotator tool enables us to guide annotators toward areas the model is not performing well on. In this way, we establish an active learning feedback loop powered by the Mayetrix which ensures your AI moat around your business is growing and able to produce the most effective models.