- Technical paper
Artificial intelligence (AI) and machine learning (ML) on the cloud is quickly becoming mainstream. Today's versatile, complex, and ambiguous business and economic environment demands a shift to data-driven decision-making. But this shift comes with challenges – a need for robust lineage, governance, and risk mitigation tactics.
The evolution of AI and ML lifecycle management
Though the typical stages of a data science lifecycle have not changed much since the early 2000s, the ecosystem hosting their components has transformed considerably.
Today, the artificial intelligence and machine learning lifecycle's expectations are different (figure 1). Enterprises need a real-time continuous integration/continuous delivery/continuous testing (CI/CD/CT) pipeline. They are grappling with a complicated network of systems that generate more data than ever before. This is due to proliferating data sources and associated data storage and management systems, advancements in computing power, and near-real-time interconnected channels.
Figure 1: The stages of AI and ML lifecycle management
Each stage requires separate personas with specialized skills to effectively translate business requirements into technical specifications (figure 2) for successful implementation and monitoring.
In the past, a traditional data pipeline was developed in a limited number of systems. Now, real-time, live systems call for integration of data sources and simultaneous data processing and analysis to feed into business intelligence (BI) reports, dashboards, and applications required for decision-making at speed.
Figure 2: Requirements of AI and ML lifecycle management
Leading your data science and ML projects
The evolution of the data ecosystem from on-premise storage and compute options to cloud-native applications designed for living in the cloud poses several challenges to enterprises. On the one hand, ML development is an experimental, exploratory process. On the other, deployment requires delivering consistent results that are secure and fail-proof in production systems. The typical activities of modern AI and ML lifecycle management are visible below (figure 3).
Figure 3: Cloud AI and ML lifecycle management activities
Our three-phase approach (figure 4) enables a sustained, results-driven shift toward AI and ML on the cloud.
Figure 4: The three phases of the AI and ML cloud journey
Planning and strategizing underpin successful cloud journeys for every organization. Not all workloads must migrate to the cloud, and organizations must clarify their reasons for such migrations to avoid cost, schedule, and performance overruns. The six stages of planning are:
- Define business objectives, performance metrics, and KPIs to monitor the ML models framed from the business objective
- Develop a strategy and change management roadmap to align the organization's people, processes, and technology for healthy adoption rates
- Delineate roles and responsibilities of the skills required for effective functioning of the project, such as those of program and project leader, industry experts, ML and technical architects, data engineers, algorithm developers, and ML and DevOps engineers
- Assess the current and future-state cloud platform to design solution architecture and the data pipeline in line with policies, regulations, and required cloud services
- Choose and appropriate cloud architecture whether it's hybrid or multi-cloud, as suited to the specific business needs of the project
- Plan steps and activities along with timelines and phases for IT provisioning and strengthen relevant cloud skills in parallel
Bringing it to life: Transforming invoicing
A global healthcare organization wanted to transform its invoice processing. A detailed assessment of customer pain points revealed the need for a low/no-touch invoice processing strategy. By gathering raw data from 3.6 million+ invoice lines and building a data and ML pipeline on the cloud, Genpact's ML algorithms predicted the probability of invoices likely to be paid late with 87% accuracy using customer segmentation variables that influence customer payment behavior. With these insights, past due invoices decreased from 20–25% to less than 12%.
For a migrate-to-cloud platform, consider the following:
- Deploy complete data pipeline and data preprocessing with cloud services as outlined in the plan stage
- Collect data sources, configure data ingestion, data transformation, data storage, and existing ML projects (training and testing pipelines) on the cloud
With the shift to cloud, teams need to adapt to development, testing, and training of models on cloud services and resources, ensuring scalability and optimal utilization and costs. The following steps ensure a successful strategy for living in the cloud.
- Develop the ML model and choose the best model based on quantitative and qualitative measures, ensuring reproducibility through a version control of data and models along with parameters in the ML system
- Deploy the chosen model to production
- Integrate with the required output such as BI dashboards, custom applications, or third-party APIs
Optimization and model monitoring are essential for maintaining a feedback loop from deployed model to building model. The ML and DevOps engineers must set up a model monitoring metrics stack and automate the monitor in real time to ensure that models remain relevant in the context of the most recent data in production. Three broad categories of metrics must be monitored:
- Stability metrics to capture data distribution shifts in production vis-à-vis training data
- Performance metrics to identify concept shifts in data and track the change in relationship between independent and dependent variables in production
- Operations metrics to identify ML system health issues such as IO/memory/CPU usage, disk utilization, ML endpoint calls, and latency
Resources must be optimized based on the operation metrics and models and must be adjusted based on stability and performance metrics. This is the only way to ensure the consistency and robustness of the ML system and build trust.
Bringing it to life: Transforming customer service
A global provider of scientific products and services lacked integrated, automated visibility into order and customer information for its customer service team. Genpact built an intelligent solution on a cloud pipeline to categorize incoming emails using ML and natural language processing to route them to appropriate systems or workstreams. This underpinned an intuitive workflow for the customer service team that expedited the resolution of customer issues and accelerated revenue growth.
Making the next move
Creating an ML model that works well is only one part of delivering integrated ML solutions. The challenges of operationalizing ML models require a prudent approach – one that helps data scientists, supports a robust data pipeline, and ensures secure, reproducible, monitored, and trustworthy ML models. The approaches outlined here will ensure cloud, ML, and AI can effectively help build data-driven organizations.
This paper is authored by Sreekanth Menon, AI/ML leader, and Megha Sinha, augmented intelligence leader, Genpact.