The content factory of machine learning | Blog | Genpact
Contact Us
hero-the-content-factory-of-machine-learning.jpg
  • Blog

The content factory of machine learning

Machine learning takes more than good AI, you need people to work the data

  • Vikram Mahidhar

    Former Global Head of Digital Transformation

    Share on LinkedIn

Published

11/14/2018

In any artificial intelligence (AI) or machine learning project, 80 to 90% of the work centers around preparing the data used to train the machine. Staff go to work in something similar to a “content factory."

In the content factory, workers are not breaking a sweat operating heavy machinery, but rather using data engineering skills and domain knowledge of industry and business to give shape and context to datasets. You need both data engineering and domain experts to realize machine learning's full potential. Here is what they do:

Gather the data and map it out

Sometimes called “data wrangling," data engineering is all about pulling together data and molding it into something digestible. After all, data can be structured, unstructured, or ambiguous. You have to first get the information out of multiple systems in order to then build models to map the data's behavior – that's the first step.

Find gaps in the information and fill them

You have to play with the data to find if it's complete and can serve the project goals, as there can be gaps in the information. For example, TV production companies need to figure out when to air ads based on time and audience. They typically have a sheet of data and tools that say what TV show to play an ad on and how long it should air. There might not be demographic and regional information. If they had such information, a company could use machine learning to determine which ads are the most effective by location and better target future placements.

Show the machine the meaning

Data enrichment is possibly the most complex and important piece of machine learning. It takes domain experts who understand how the business works to label the data with context and give it meaning.

For instance, an amusement park can use an online chatbot to interact with possible visitors looking for quick information like admission prices. While today's bots can deliver simple, scripted answers, most lack the ability to have heartfelt conversations. To make empathy possible, chatbots have to be able to interpret messages and have a clear goal like converting a prospect to a customer. A domain expert can provide an idea of the types of sentiments that humanize the process and turn interactions into revenue.

Don't let the machine run amok

A machine can now string together business rules and recommendations. But how do you know it did the job right? You can't unless you check it. You need governance, or “supervised learning" by domain experts, to see that the machines are connecting the dots correctly. Experts can look over the rules and manually validate results.

For example, automotive insurance companies can train a system to simplify its payout process using millions of images of previous accidents. The trained model can then evaluate the degree of future accidents and recommend a payout amount. But someone should sit down and review the machine's recommendation – otherwise, it might determine a car is totaled when it's actually repairable.

Look for bias in data

Just like people, AI is prone to unconscious bias, stemming from imbalanced datasets or interactions over time. If left unchecked, biases can hurt your business and customers, ultimately defeating the purpose of turning to machine learning in the first place.

For instance, a company can use historical data to build a conversational AI model. But what if its past conversation records have been disproportionately with female customers? In that case, the model will have a gender bias to female customers. (Obviously, similar biases would come up if the model interacted with only male customers.) You need to review your data to get rid of existing biases, make sure the samples are comprehensive, and do ongoing reviews to prevent new biases.

All of these aspects of getting a machine learning project off the ground involve a lot of human work from people with data engineering and domain expertise that can enrich the data. So, while AI and machine learning can help you address your organization's toughest problems, people in the loop are vital to make it all work.

Visit our artificial intelligence solutions page