Data Architecture and Maturity

Target state architecture for Data Centric Organization

25 min read
Overview & Vision

Defining the role of data in the organization towards becoming a Data Centric Organization

Transitioning from static reporting to AI industrialization and data-driven decision making

Customer 360

Capture complete customer data across all touchpoints

Real-time Availability

Provide data in real-time for operational decisions

Democratize Data Access

Enable self-service data access for business users

Guiding Principles

Core principles guiding technical decisions

Business Driven

Data scope and functionality must be driven by business needs

Data Freshness

Data must be updated in near real-time for front-end applications

Raw Data Retention

Store all raw data at low cost for future processing

Robust Governance

Strict governance on definitions, quality, and security

Scalability

Platform capable of scaling on demand

Standardization

Standardize critical data attributes across the enterprise

Reference Architecture

Data flow from source to consumption

Layer 1: Data Sources

Data Sources

Internal

Policy Admin Systems (PAS)ClaimsBillingDistribution

External & Unstructured

Social MediaWeb LogsPartner DataImages/Voice

Layer 2: Data Ingestion

Data Ingestion

Batch Processing

For historical data or large volumes not needed urgently

Real-time CDC

Required for core systems to ensure ODS data is always fresh

Streaming (Event Hub/Kafka)

Real-time event processing (IoT, Clickstream)

Layer 3: Data Platform

Data Platform

Raw Data Hub (Data Lake)

Store in native format

StructuredSemi-structuredUnstructured

Curated & Consumption Hub

ODSCustomer 360

Current transactional data, supports front-end APIs, low latency (<15 min)

EDW

Historical storage, serves management reporting

Data Marts

Aggregated data by subject area

SalesFinanceMarketing
Metrics (Platinum Layer)

Aggregated metrics serving multiple reports/analytics, foundation for AI

Layer 4: Analytics & Consumption

Analytics & Consumption

Business Intelligence

Self-service Dashboard (Tableau/PowerBI) for business users

Advanced Analytics

Sandbox environment for Data Scientists to run ML and Predictive Analytics models

API Layer

Provide data back to front-office applications

Architecture Diagrams

Upload and manage architecture diagrams. Each type supports As-Is and To-Be views with version history.

Loading...

Key Capabilities

Critical functional blocks to build

Customer 360 (C360)

Unify customer data from multiple sources, handle de-duplication to create a Single Source of Truth

Data Science Lab (AI/MLOps)

Integrated environment (IDE), visualization tools, and powerful computing (Spark Cluster) for AI model training

Self-Service Data

Enable business users to access Data Marts and Business Glossary without full IT dependency

Unstructured Data Processing

Ability to process images (OCR), voice (Voice-to-text) for claims and customer service

Governance & Security

Layer covering the entire architecture

Metadata Management

  • Data Dictionary
  • Data Lineage

Data Quality

  • Completeness
  • Accuracy
  • Consistency
  • Critical Data Elements (CDE)

Data Management

  • Data Profiling
  • Master Data Management
  • Reference Data Management
  • Data Observability
  • FinOps

Security

  • Encryption

    Data encryption at rest and in motion

  • Access Control

    Role-based access control (RBAC) and Row Level Security

  • Data Masking

    For sensitive data (PII)

Technology Stack

Recommended technologies based on Cloud-native approach

LayerRecommended Technology
Ingestion
EMR
Storage
Delta Lake
Processing
EMR / Spark
Analytics/AI
SageMaker / Databricks
Visualization
Power BI
Governance
Unity Catalog / Purview