Data Architecture and Maturity
Target state architecture for Data Centric Organization
Defining the role of data in the organization towards becoming a Data Centric Organization
Transitioning from static reporting to AI industrialization and data-driven decision making
Customer 360
Capture complete customer data across all touchpoints
Real-time Availability
Provide data in real-time for operational decisions
Democratize Data Access
Enable self-service data access for business users
Guiding Principles
Core principles guiding technical decisions
Business Driven
Data scope and functionality must be driven by business needs
Data Freshness
Data must be updated in near real-time for front-end applications
Raw Data Retention
Store all raw data at low cost for future processing
Robust Governance
Strict governance on definitions, quality, and security
Scalability
Platform capable of scaling on demand
Standardization
Standardize critical data attributes across the enterprise
Reference Architecture
Data flow from source to consumption
Layer 1: Data Sources
Data Sources
Internal
External & Unstructured
Layer 2: Data Ingestion
Data Ingestion
Batch Processing
For historical data or large volumes not needed urgently
Real-time CDC
Required for core systems to ensure ODS data is always fresh
Streaming (Event Hub/Kafka)
Real-time event processing (IoT, Clickstream)
Layer 3: Data Platform
Data Platform
Raw Data Hub (Data Lake)
Store in native format
Curated & Consumption Hub
Current transactional data, supports front-end APIs, low latency (<15 min)
Historical storage, serves management reporting
Aggregated data by subject area
Aggregated metrics serving multiple reports/analytics, foundation for AI
Layer 4: Analytics & Consumption
Analytics & Consumption
Business Intelligence
Self-service Dashboard (Tableau/PowerBI) for business users
Advanced Analytics
Sandbox environment for Data Scientists to run ML and Predictive Analytics models
API Layer
Provide data back to front-office applications
Architecture Diagrams
Upload and manage architecture diagrams. Each type supports As-Is and To-Be views with version history.
Key Capabilities
Critical functional blocks to build
Customer 360 (C360)
Unify customer data from multiple sources, handle de-duplication to create a Single Source of Truth
Data Science Lab (AI/MLOps)
Integrated environment (IDE), visualization tools, and powerful computing (Spark Cluster) for AI model training
Self-Service Data
Enable business users to access Data Marts and Business Glossary without full IT dependency
Unstructured Data Processing
Ability to process images (OCR), voice (Voice-to-text) for claims and customer service
Governance & Security
Layer covering the entire architecture
Metadata Management
- Data Dictionary
- Data Lineage
Data Quality
- Completeness
- Accuracy
- Consistency
- Critical Data Elements (CDE)
Data Management
- Data Profiling
- Master Data Management
- Reference Data Management
- Data Observability
- FinOps
Security
- Encryption
Data encryption at rest and in motion
- Access Control
Role-based access control (RBAC) and Row Level Security
- Data Masking
For sensitive data (PII)
Technology Stack
Recommended technologies based on Cloud-native approach
| Layer | Recommended Technology |
|---|---|
Ingestion | EMR |
Storage | Delta Lake |
Processing | EMR / Spark |
Analytics/AI | SageMaker / Databricks |
Visualization | Power BI |
Governance | Unity Catalog / Purview |