Datasaur Receives More Funding, to Optimize Data Labeling Platform

The data labeling platform developer startup, Datasaur, has announced new funding worth $1 million or equivalent to 14.2 billion Rupiah. This is a same round with the last one with GDP Venture. There are some angel investors involved, one is Calvin French-Owen as Segment’s Co-Founder & CTO.

The fresh money will be used for platform capability, including minimizing bias on text labeling. As we all know, data labeling become one of the most crucial processes in the development of artificial intelligence (AI) based services, particularly in the natural language process (NLP).

Datasaur developed tools to support data labeling workers to be more productive and efficient. It includes to improve data privacy and security – in fact, most data labeling is done by outsourcing.

“Basically, we are now handling all kinds of NLP, including entity recognition, parts of speech, document labeling, coreference resolution, and dependency parsing. We’re to build intelligence into the system to make labeling process more efficient and accurate and allow the company to manage the data labeling team through a simple platform,” Datasaur’s Founder & CEO, Ivan Lee told DailySocial

Ivan Lee (middle) and Datasaur team / Datasaur
Ivan Lee (middle) and Datasaur team / Datasaur

Currently, the Datasaur team is participating in the Y Combinator acceleration program for the Winter 2020 batch in San Francisco. The company’s based in California and Indonesia.

NLP become the most AI technology-based implementation in Indonesia

AI is getting more popular as services that can automate several business processes emerged. One of the most widely used products is a chatbot, the corporation is busy using the platform to provide automatic replies to every message given by a customer. Some of them are BCA (chatbot name: Vira), Telkomsel (Veronika), BNI (Love) and others.

Behind the chatbot technology, there are a variety of AI tools applied, one of the most significant is NLP. Its function is to make the computer system understand t0ahe language and context written by the user. In fact, there are still many shortcomings in the current chatbot product, including the most fundamental which is the lack of vocabulary understanding. The impact on services that still feels very rigid, is as natural as the conversation between humans.

Advantages and challenges for chatbot implementation for business
Advantages and challenges for chatbot implementation for business

One of the results of labeling the data is used to train the machine (known as the concept of machine learning) in order to have a better understanding of language, by classifying certain words into groups that have been defined. Some of the scenarios carried out, for example, are continuously learning new words conveyed by the user.

“Despite all the hype, AI is a technology that is still being developed. Many companies are looking for best practices in their labeling process. The first generation solution is to outsource all the labeling work. Many companies are building ‘Mechanical Turk’, but for AI, ” Ivan explained.

He continued, “We now see companies identify that high-quality data is one of the most valuable assets to build and improve AI models. Datasaur is present as the next generation solution, we build software to improve best practices in data labeling, to help develop AI workflows company.”

Along with its development, the market share of AI-based products will continue to increase. Research projects that the global value will reach US$ 390 billion in 2025. For data labeling itself, on the global scene, there are several other services besides Datasaur that can help such as Labelbox, Cloudfactory, and even Google Cloud products are also releasing beta versions for AI Data Labeling Services.

Data labeling implementation scheme

Example of data labeling process in Datasaur
Example of data labeling process in Datasaur / Datasaur

By understanding the input data, there are many things that can be done. From the existing case studies, Datasaur helps companies to do various things, such as understanding contract documents, transcribing customer service conversations, analyzing product reviews, and detecting false news.

“Our software has been used to detect and mark suspicious fake news articles by the Indonesian government. A case study with one of our clients shows a 70% increase in labeling efficiency after adopting the Datasaur platform, and we still have more room to improve,” Ivan said.

Currently, the data labeling platform has been used by various business verticals, from the financial technology industry, health, customer service, social media to chatbot.

Revision from the previous article: this is not a follow on funding, still in a seed round similar with the last one from GDP Venture


Original article is in Indonesian, translated by Kristin Siagian