Digital Technology
Aug 23, 2017

Defining computational linguistics for the modern business

Read the following two sentences:

  • “This house is in an undesirable location."
  • “This undesirable house is in this location."

These two sentences use virtually the same words. However, because of how they are structured, they have completely different meanings. Looking at the arrangement of words and arriving at their context has been an ability reserved for the human mind. But now, through breakthroughs in natural language processing (NLP), namely computational linguistics, that's not the case anymore.

What is computational linguistics?

Computational linguistics is the ability for a computing engine to read a document and extract data, or to infer knowledge from it based on linguistic structure. It's no different from how any person would read a PDF, Microsoft Word document, or webpage and draw out the information they're looking for — just on a larger scale and faster.

Computational linguistics extends the applicability of NLP to making risk, operational, and performance decisions. You can learn more about the evolution of computational linguistics in a recent article I wrote for

Real-world business examples

In commercial lending, banks must understand and manage the risks involved in lending to small and mid-sized businesses based on their overall financial well-being. This traditionally involves reading through mountains of balance sheets and financial statements and calculating risk scores that are then aggregated across the portfolio. When delegated to human staff members, the process can be long, inefficient, and prone to human error.

But using computational linguistics, banks can extract data from thousands of balance sheets, even if they're in multiple languages or use different accounting standards, and put them in a structured format. Once the data is structured, the banks can slice and dice the information and calculate the risk score or other parameters of the portfolio. What's more, because the process is automated, it all happens very fast and the banks know that their data is accurate.

Beyond commercial lending, computational linguistics can be applied to numerous other industries and business arenas. For instance, many companies have to perform contract reconciliations, which involves reviewing thousands of invoices and correlating them with contracts. Computational linguistics can “read" all the contracts and invoices and pull them into a useable, structured format to reconcile for any overpayments.

In wealth management, investment firms can use computational linguistics to review complex custodial statements that contain hundreds of transactions, including hedge funds, derivatives, and specialized funds. This enables the firms to extract all the relevant asset and trade information and easily convert the data into automated performance reports. As a result, a reporting process that could take up to 90 days can be completed overnight.

4 essentials to success with computational linguistics

If the examples described above sound appealing, keep in mind there are four essentials to success with computational linguistics before heavily investing in the technology: scale, speed, accuracy, and traceability.

  1. Scale: The technology should be scalable, whether it's working with hundreds of documents or thousands
  2. Speed: It has to be able to review a high volume of documents and extract relevant information fast
  3. Accuracy: The extracted data has to be accurate, especially if it will be used to make business-critical decisions. It's not enough to be at 60-70% — there needs to be 98-99% accuracy
  4. Traceability: Companies subject to audits have to be able to track and trace how they arrived at their ultimate decision. In commercial lending, for instance, a small footnote could make a dramatic impact on the risk score. Banks need to be able to drill down and pinpoint key data sources.

Computational linguistics' strength lies in its ability to read documents based on the linguistic structure of their content, rather than requiring large amounts of pre-existing data. So it doesn't need a database filled with millions of documents to understand the next one. Because of its scalability, speed, accuracy, and traceablility, computational linguistics can be a powerful tool for today's businesses looking to drive greater operational efficiency, data availability, and smarter decision making.

About the author

Sanjay Srivastava

Sanjay Srivastava

Chief Digital Officer

Sanjay Srivastava is Chief Digital Officer, where he runs Genpact’s growing Digital business, overseeing the Genpact Cora platform and all Digital products and services.