Analytics & Big Data
Jun 01, 2017

There is an octopus in my data lake

Let's face it - data has become a cornerstone to our everyday lives. We generate huge amounts of data in our personal lives with everything from our social media postings, to data generated by intelligent appliances in our homes through the internet of things. Data is the lifeblood of many businesses, and when used appropriately, its most vital asset. Businesses in every industry generate or utilize massive amounts of data in all forms and from many different sources. The struggle is how to organize this data for greater access and insights.

Data management has evolved from data warehouses to data lakes to data swamps, and yet, we remain drowning in our data. Massive data warehouses accumulate data from everywhere. There is a continuous gathering and centralizing of fresh data but always resulting in questions about the proper sourcing, cleansing, timeliness, validity, accuracy and relevance of the data. This has led to huge investments in storage, software, networking, and other computing infrastructure over the years. Unfortunately, the return on these investments in the form of meaningful insights that can guide smarter execution has been underwhelming.

The time has come to do things differently. What we need is an octopus….an "octopus architecture" that is!!! Why an octopus you might wonder? The brain power of an octopus is highly decentralized. The minds of octopuses are distributed across the body but also work in unison. With sufficient neuronal connection, this enables them to seek and integrate multiple perspectives for greater insight and agility.

Let's look at how an octopus architecture can be beneficial and constructed within three phases.

Infographic: Octopus Architecture

Phase 1

Imagine that your data continues to lie in its native form, at the source where it is generated. Let's call this a data lake. You likely have many such data lakes. The goal of this first phase is to effectively collect localized data from these lakes and report on it without having to store it in a centralized repository.

The octopus extends its tentacles into each of these lakes, and pulls information at high speeds as needed, when needed. This preserves the immediate source, purity and timeliness of the data. The ability to trace the data is a clear advantage. In this architecture, only data that is relevant for a larger analysis is extracted up to the centralized brain of the octopus. Much of the data can be processed locally for simple visualization and insight generation.

How does this differ from a traditional data warehousing model?

  • Most of the data is not transferred to a central repository — it stays in its lake
  • Only the sliver of relevant data is extracted and dynamically pulled into the centralized brain for action at the time it is needed

Phase 2

The brain power of an octopus is decentralized. Phase 2 builds up localized intelligence in the tentacle of a specific data lake, thereby decentralizing data extraction and analysis brain power. This intelligence is built up over time through machine learning. Tentacle brain power of the octopus architecture entails the following:

  • Intelligent data manipulation/extraction: This intelligence incorporates the ability to determine what data is relevant or not for particular queries. It incorporates intelligence about how the data should be extracted, manipulated or translated and thereby only sends data to the centralized brain can be readily used;
  • Edge analytics/computing: Part of the intelligence is the ability for data to be analyzed at the tentacle node itself. Rather than transmitting raw data to the centralized brain, it transmits computed results back to the centralized brain;
  • Model use cases: Localized intelligence at the extremities, and coordinated centralized brain intelligence, learns to apply the right models and visualizations to the appropriate kinds of data for the business question being asked. This may include normalizing data variances for product, seasonality, or other factors;
  • Artificial intelligence and machine learning: Over time processing accelerates as intelligence is acquired that speeds up the processing, interpretation and actions applied on fresh data based on what has been learned from historic execution.

Phase 3

Going beyond data extraction and analysis, Phase 3 extends the distributed brain power of the octopus architecture to incorporate intelligence at the extremities for determining appropriate actions.

In an example of an industrial use case where the octopus architecture is responsible for managing aircraft engines deployed across the globe, a determined action may be to prevent that aircraft from taking off because the intelligent analysis in the tentacle determined that there was a high level of probability for engine failure.

In an example where the use case may be to monitor financial crimes, a determined action may be to detect, flag, and / or block accounts where money laundering transactions may be being attempted.

Vast benefits of an Octopus Architecture

The highly distributed but coordinated intelligence of such an architecture eliminates the limitations of a traditional data warehouse architecture. Data and insights are more accurate, relevant and timely. Large scale data transmissions and reservoirs are minimized. Intelligence is applied at the source where it often provides greater insight, meaning and agility. These types of architectures can have tremendous benefits in applications for achieving improved compliance, safety, regulatory responsiveness, revenue generation, and loss prevention. The use cases can be extensive across applications and industries, and are particularly applicable in scenarios involving the internet of things.

There is a reason I have used the octopus as the preferred creature in analogy. That is a choice also backed by data. After all, an octopus has eight brains and three hearts. Now wouldn't you want this evolved creature to be used in your business?

About the author

Rohit Tandon

Rohit Tandon

Business Leader, Analytics

Follow Rohit Tandon on LinkedIn