Challenge
Big headaches from complex systems
The client, which supports franchises in oncology, rare disease, immunology, and neuroscience, relied on three separate commercial data ecosystems across six franchises. This fragmented approach created significant business hurdles:
Siloed infrastructure: Duplicative vendor contracts and point solutions scattered data pipelines across all six franchises
Operational inefficiencies: Redundant infrastructure and manual data operations created unsustainable operating expenses
Rigid field operations: Legacy systems could not support dynamic, real-time changes to field force alignments, causing weeks of lag time
Lack of governance: Without a unified data layer, the company struggled with inconsistent data sources and lacked a scalable foundation for cross-franchise analytics or future AI initiatives
Solution
One powerful platform, endless possibilities
To eliminate these operational roadblocks, we designed and implemented a unified AWS commercial data and AI platform. We established a centralized data lake on Amazon Simple Storage Service (S3) with a raw data zone to consolidate data across franchises, enabling scalable storage and decoupled processing. This modern data foundation brought all six franchises into a single governed ecosystem.
Our strategic approach included:
Source-agnostic ingestion: We built a flexible ingestion layer on AWS using AWS Transfer Family for secure file-based ingestion and Amazon AppFlow for SaaS data integration, enabling seamless onboarding of syndicated and third-party data sources
Dynamic targeting engine: We developed an AWS-native engine to enable real-time, ad-hoc field alignment changes across territories, significantly reducing the need for manual intervention
Reusable data supply chain: We implemented a scalable data processing framework using AWS Glue Data Catalog for metadata management, AWS Lambda for lightweight transformations, and Amazon EMR Serverless for large-scale data processing, orchestrated through Amazon Managed Workflows for Apache Airflow (MWAA)
Lakehouse analytics on AWS: We used Amazon Redshift with external tables to enable a lakehouse architecture, allowing direct querying of data in Amazon S3 alongside curated warehouse datasets
Workflow orchestration: We orchestrated end-to-end workflows using Amazon MWAA, enabling reliable pipeline scheduling, dependency management, and operational visibility
AI-ready architecture: We designed the platform to support advanced analytics and future machine learning use cases by providing a governed, scalable data foundation integrated with AWS-native analytics services
Impact
Real wins that make a difference
The modernization effort delivered immediate, measurable ROI and transformed how the commercial, market access, and field teams operate. By rationalizing vendors and decommissioning legacy systems, the new platform effectively funded its own development.
Key results include:
Massive consolidation: Consolidated multiple fragmented data pipelines into a centralized AWS data platform built on Amazon S3 and Amazon Redshift, reducing duplication and simplifying operations
Real-time agility: Significantly reduced field alignment lag time, enabling faster territory changes and dynamic targeting
Faster time to insight: Enabled faster analytics by allowing business users to query centralized data through Amazon Redshift and business intelligence (BI) tools such as Tableau, cutting delays caused by fragmented systems
Accelerated scalability: Reduced the time required to onboard new franchises, enabling faster scaling
Future-proofed innovation: Established a clear roadmap for AI innovation, including customer journey analytics, chatbots, and advanced engagement models
50% cost reduction: Cut data operations and vendor operating expenses