Data Replication for Reporting
Overview
Section titled “Overview”This document explains why and how we replicate master data from app_master_db to app_core_db to support enterprise reporting tools while maintaining microservices architecture principles.
The Problem: Microservices vs. Legacy Reporting Tools
Section titled “The Problem: Microservices vs. Legacy Reporting Tools”Architecture Constraint
Section titled “Architecture Constraint”The system has the following constraints:
-
Separate Databases: Databases are separated following the Bounded Context principle
app_master_db: Stores Master Data/Reference Data (e.g.,labelstable)app_core_db: Stores Transactional Data for resources (e.g.,resourcestable)
-
Reporting Tool Limitation: Jasper Reports has technical limitations
- Requires direct SQL queries via JDBC connection
- Cannot call HTTP APIs
- Cannot JOIN across databases without Foreign Data Wrapper (FDW)
Business Requirement
Section titled “Business Requirement”When generating resource reports (e.g., Resource Card), we need to display:
- Full resource name with label (e.g., “Mr. John Smith”)
- The
resourcestable stores onlylabel_code(e.g., ‘MR’, ‘MS’) - Full label data (
label,display_name) is stored inapp_master_db.labels
Problem: Reporting tools cannot JOIN across databases directly.
The Solution: Read Model Replication (CQRS Pattern)
Section titled “The Solution: Read Model Replication (CQRS Pattern)”What We’re Doing
Section titled “What We’re Doing”We replicate the labels table from app_master_db into app_core_db by creating a read-only lookup table named labels_lookup.
graph LR
A[app_master_db<br/>labels table<br/>Source of Truth] -->|Replicate| B[app_core_db<br/>labels_lookup<br/>Read Model]
C[Reporting Tool] -->|SQL JOIN| B
D[resources table] -->|label_code| B
style A fill:#FFB6C1
style B fill:#90EE90
style C fill:#87CEEB
Example: Report Query (Now Possible)
Section titled “Example: Report Query (Now Possible)”-- ✅ Query that works after having labels_lookupSELECT r.reference_number, CONCAT(l.label, ' ', r.first_name, ' ', r.last_name) AS full_name, r.birth_date, r.national_idFROM resources rLEFT JOIN labels_lookup l ON r.label_code = l.codeWHERE r.id = :resourceId;Result:
reference_number | full_name | birth_date | national_id-----------------+-------------------+-------------+-------------REF001 | Mr. John Smith | 1985-06-15 | 1234567890REF002 | Mrs. Jane Doe | 1990-03-22 | 9876543210Why This is NOT an Anti-Pattern
Section titled “Why This is NOT an Anti-Pattern”❌ Common Misconception
Section titled “❌ Common Misconception”“Data Duplication is always an anti-pattern that must be avoided!”
Answer: In a Monolithic System → Yes, it’s an anti-pattern. But in Microservices → This is a standard pattern called CQRS Read Model.
✅ Valid Reasons According to DDD and Microservices Principles
Section titled “✅ Valid Reasons According to DDD and Microservices Principles”1. CQRS Pattern (Command Query Responsibility Segregation)
Section titled “1. CQRS Pattern (Command Query Responsibility Segregation)”- Command Side (Write): Master Data Service is the only one that writes label data
- Query Side (Read): Core Service has a read model (
labels_lookup) for complex queries
graph TD
A[User Request: Update Label] --> B[Master Data Service]
B -->|Write| C[(app_master_db<br/>labels)]
B -->|Emit Event| D[Event Bus<br/>RabbitMQ/Redis]
D -->|Consume Event| E[Core Service]
E -->|Update Read Model| F[(app_core_db<br/>labels_lookup)]
G[Report Request] --> F
style C fill:#FFB6C1
style F fill:#90EE90
style D fill:#FFE4B5
2. Master Data Characteristics
Section titled “2. Master Data Characteristics”Master Data has special characteristics that make it suitable for replication:
| Characteristic | Why It’s Suitable for Replication |
|---|---|
| Rarely Changes | Labels don’t change often (maybe a few times per year) |
| Small Dataset | Not much data (typically < 100 records) |
| Read-Heavy | Read very frequently (every time a resource report is generated) |
| Stable Schema | Table structure doesn’t change |
3. CAP Theorem Trade-Off
Section titled “3. CAP Theorem Trade-Off”In a Distributed System, we must choose between:
- C (Consistency): Data must be correct everywhere immediately
- A (Availability): System must always be available
- P (Partition Tolerance): Must work even if network fails
Our Decision:
Choose: Availability + Partition ToleranceTrade-off: Eventual Consistency (accept data may not be synchronized for < 1 second)Rationale: For Reporting/Analytics, we don’t need real-time consistency because:
- If a label changes, showing it in reports 1-2 seconds later doesn’t affect operations
- But if the Reporting Service crashes (not Available), we cannot generate reports at all
4. Service Autonomy (Service Independence)
Section titled “4. Service Autonomy (Service Independence)”graph TD
subgraph "❌ Without Read Model"
A1[Report Request] --> B1[Core Service]
B1 -->|API Call| C1[Master Data Service]
C1 -->|Return Data| B1
B1 -->|Generate Report| A1
end
subgraph "✅ With Read Model"
A2[Report Request] --> B2[Core Service]
B2 -->|SQL JOIN| C2[(labels_lookup<br/>Local)]
C2 -->|Fast Response| B2
B2 -->|Generate Report| A2
end
D[Master Data Service Down] -.->|Impact| C1
D -.->|No Impact| C2
style C1 fill:#FFB6C1
style C2 fill:#90EE90
Benefits:
- Core Service doesn’t depend on Master Data Service during report generation
- If Master Service crashes, Core Service can still generate reports (High Availability)
5. Industry Standard Pattern
Section titled “5. Industry Standard Pattern”Leading companies use this pattern:
| Company | Use Case |
|---|---|
| Netflix | User profile data is replicated to Recommendation Service |
| Amazon | Product catalog is replicated to Search Service (Elasticsearch) |
| Uber | Driver/Rider data is replicated to Analytics/Reporting DB |
Comparison of Approaches
Section titled “Comparison of Approaches”Option 1: Data Replication (Read Model) ✅ RECOMMENDED
Section titled “Option 1: Data Replication (Read Model) ✅ RECOMMENDED”-- Architectureapp_master_db.labels (Master) ↓ Replicate (Event-Driven)app_core_db.labels_lookup (Read Model) ↑ JOIN by Reporting Toolapp_core_db.resources| Pros | Cons |
|---|---|
| ✅ Simple SQL JOIN in reports | ⚠️ Eventual consistency (delay < 1 second) |
| ✅ No runtime dependency | ⚠️ Increased storage (minimal for master data) |
| ✅ Fast reads (local data) | ⚠️ Requires sync logic |
| ✅ Follows CQRS pattern | |
| ✅ High availability |
Summary: ✅ Suitable for Enterprise Reporting
Option 2: PostgreSQL Foreign Data Wrapper (FDW) ⚠️
Section titled “Option 2: PostgreSQL Foreign Data Wrapper (FDW) ⚠️”-- SetupCREATE EXTENSION postgres_fdw;
CREATE SERVER master_dbFOREIGN DATA WRAPPER postgres_fdwOPTIONS (host 'master-db-host', dbname 'app_master_db');
CREATE FOREIGN TABLE labels_remote ( code VARCHAR(10), label VARCHAR(50), display_name VARCHAR(50))SERVER master_dbOPTIONS (schema_name 'public', table_name 'labels');
-- QuerySELECT r.*, l.labelFROM resources rLEFT JOIN labels_remote l ON r.label_code = l.code;| Pros | Cons |
|---|---|
| ✅ No data duplication | ❌ Network latency (cross-DB query) |
| ✅ Always fresh data | ❌ Tight coupling (needs DB credentials) |
| ❌ Security risk (DB-to-DB connection) | |
| ❌ Single point of failure | |
| ❌ Complex setup & monitoring |
Summary: ⚠️ Not recommended for Production
Option 3: JSONB Denormalization ❌
Section titled “Option 3: JSONB Denormalization ❌”-- Store full label in resources tableALTER TABLE resourcesADD COLUMN label_data JSONB;
-- Example data-- { "code": "MR", "label": "Mr.", "display_name": "Mister" }
-- QuerySELECT reference_number, label_data->>'label' AS label, first_nameFROM resources;| Pros | Cons |
|---|---|
| ✅ No separate table | ❌ Storage bloat (duplicate data in every row) |
| ✅ Fast reads | ❌ Difficult to update when labels change |
| ❌ Poor indexing performance | |
| ❌ Data inconsistency risk |
Summary: ❌ Avoid for Master Data
Option 4: API Composition at Report Time ❌
Section titled “Option 4: API Composition at Report Time ❌”// Pseudo-codeconst resource = await fetchResource(resourceId);const label = await masterDataService.getLabel(resource.label_code);const report = generateReport({ ...resource, label });| Pros | Cons |
|---|---|
| ✅ No data duplication | ❌ Jasper doesn’t support HTTP calls |
| ✅ Always fresh data | ❌ Need to write custom code layer |
| ❌ Performance overhead (N+1 queries) | |
| ❌ Runtime dependency |
Summary: ❌ Not feasible with Jasper Reports
Implementation: Event-Driven Replication
Section titled “Implementation: Event-Driven Replication”System Architecture
Section titled “System Architecture”graph TD
A[User: Update Label] --> B[Master Data Service]
B -->|1. Save to DB| C[(app_master_db<br/>labels)]
B -->|2. Emit Event| D[RabbitMQ/Redis<br/>Event Bus]
D -->|3. label.created| E[Core Service<br/>Event Handler]
D -->|3. label.updated| E
D -->|3. label.deleted| E
E -->|4. Sync to Read Model| F[(app_core_db<br/>labels_lookup)]
G[Cron Job<br/>Daily 2 AM] -.->|Fallback Full Sync| F
H[Reporting Tool] -->|5. SQL Query with JOIN| F
style C fill:#FFB6C1
style F fill:#90EE90
style D fill:#FFE4B5
style G fill:#FFA500
Strategy: Primary + Fallback
Section titled “Strategy: Primary + Fallback”We use 2 strategies together:
-
Primary: Real-Time Event-Driven Sync
- Latency: < 1 second
- Use Case: Normal operations
-
Fallback: Daily Full Sync (Cron Job)
- Schedule: Every day at 2 AM
- Use Case: Recover from missed events, network issues
Implementation Code Examples
Section titled “Implementation Code Examples”Step 1: Create Read Model Entity
Section titled “Step 1: Create Read Model Entity”import { Column, Entity, PrimaryColumn, UpdateDateColumn } from 'typeorm';
import { AppDatabases } from '@lib/common/enum/app-databases.enum';
/** * Read Model (Replicated from Master Data Service) * * Purpose: Local lookup table for reporting tools to JOIN with resources * * Key Differences from Master Table: * - Read-only (no created_by, updated_by) * - No soft delete (sync handles deletions) * - Only essential columns needed for reporting */@Entity({ name: 'labels_lookup', database: AppDatabases.APP_CORE,})export class LabelLookup { @PrimaryColumn({ type: 'varchar', length: 10, comment: 'Label code' }) code: string; // e.g., 'MR', 'MS', 'DR'
@Column({ type: 'varchar', length: 50, comment: 'Display label' }) label: string; // e.g., 'Mr.', 'Mrs.', 'Dr.'
@Column({ type: 'varchar', length: 50, comment: 'Full display name' }) display_name: string; // e.g., 'Mister', 'Mistress', 'Doctor'
@UpdateDateColumn({ type: 'timestamptz', comment: 'Last sync time' }) updated_at: Date;}Step 2: Database Migration
Section titled “Step 2: Database Migration”import { MigrationInterface, QueryRunner } from 'typeorm';
export class CreateLabelsLookupTable1759221182217 implements MigrationInterface { name = 'CreateLabelsLookupTable1759221182217';
public async up(queryRunner: QueryRunner): Promise<void> { // Step 1: Create lookup table await queryRunner.query(` CREATE TABLE "labels_lookup" ( "code" VARCHAR(10) PRIMARY KEY, "label" VARCHAR(50) NOT NULL, "display_name" VARCHAR(50) NOT NULL, "updated_at" TIMESTAMPTZ DEFAULT NOW(), CONSTRAINT "chk_labels_lookup_code" CHECK (LENGTH("code") > 0) ); `);
// Step 2: Add index for fast JOINs await queryRunner.query(` CREATE INDEX "idx_labels_lookup_code" ON "labels_lookup" ("code"); `);
// Step 3: Add table comment await queryRunner.query(` COMMENT ON TABLE "labels_lookup" IS 'Read model replicated from app_master_db.labels for reporting tools'; `); }
public async down(queryRunner: QueryRunner): Promise<void> { await queryRunner.query(`DROP INDEX IF EXISTS "idx_labels_lookup_code";`); await queryRunner.query(`DROP TABLE IF EXISTS "labels_lookup";`); }}Step 3: Master Data Service — Emit Events
Section titled “Step 3: Master Data Service — Emit Events”import { Inject, Injectable } from '@nestjs/common';import { ClientProxy } from '@nestjs/microservices';import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { AppMicroservice } from '@lib/common/enum/app-microservice.enum';import { AppDatabases } from '@lib/common/enum/app-databases.enum';import { LogsService } from '@lib/common/modules/log/logs.service';
import { CreateLabelDTO } from '../dto/create-label.dto';import { UpdateLabelDTO } from '../dto/update-label.dto';import { Label } from '../entities/label.entity';
@Injectable()export class LabelsService { constructor( private readonly logger: LogsService, @InjectRepository(Label, AppDatabases.APP_MASTER) private readonly labelRepository: Repository<Label>, @Inject(AppMicroservice.MasterData.name) private readonly eventBus: ClientProxy, ) {}
async createLabel(createDTO: CreateLabelDTO, userId: string): Promise<Label> { // 1. Save to master database const newLabel = await this.labelRepository.save({ ...createDTO, created_by: userId, updated_by: userId, });
this.logger.log(`Created label: ${newLabel.code}`);
// 2. Emit domain event for other services to consume this.eventBus.emit('label.created', { event: 'label.created', payload: { id: newLabel.id, code: newLabel.code, label: newLabel.label, display_name: newLabel.display_name, }, metadata: { timestamp: new Date().toISOString(), source: 'masterdata-service', userId, }, });
return newLabel; }
async updateLabel(code: string, updateDTO: UpdateLabelDTO, userId: string): Promise<Label> { // 1. Update in master database await this.labelRepository.update( { code }, { ...updateDTO, updated_by: userId, updated_at: new Date() }, );
const updatedLabel = await this.labelRepository.findOne({ where: { code } });
this.logger.log(`Updated label: ${code}`);
// 2. Emit domain event this.eventBus.emit('label.updated', { event: 'label.updated', payload: { id: updatedLabel.id, code: updatedLabel.code, label: updatedLabel.label, display_name: updatedLabel.display_name, }, metadata: { timestamp: new Date().toISOString(), source: 'masterdata-service', userId, }, });
return updatedLabel; }
async deleteLabel(code: string, userId: string): Promise<void> { await this.labelRepository.softDelete({ code });
this.logger.log(`Deleted label: ${code}`);
this.eventBus.emit('label.deleted', { event: 'label.deleted', payload: { code }, metadata: { timestamp: new Date().toISOString(), source: 'masterdata-service', userId, }, }); }
async findAllActive(): Promise<Label[]> { return this.labelRepository.find({ where: { deleted_at: null }, order: { code: 'ASC' }, }); }}Step 4: Core Service — Event Handlers
Section titled “Step 4: Core Service — Event Handlers”import { Controller } from '@nestjs/common';import { EventPattern, Payload } from '@nestjs/microservices';
import { LogsService } from '@lib/common/modules/log/logs.service';
import { LabelSyncService } from '../services/label-sync.service';
@Controller()export class LabelSyncController { constructor( private readonly logger: LogsService, private readonly labelSyncService: LabelSyncService, ) {}
@EventPattern('label.created') async handleLabelCreated(@Payload() event: any) { const { payload, metadata } = event;
this.logger.log({ message: `Received label.created event for code: ${payload.code}`, context: { event: 'label.created', code: payload.code, source: metadata.source }, });
try { await this.labelSyncService.syncLabel(payload); this.logger.log({ message: `Synced label: ${payload.code}`, context: { code: payload.code } }); } catch (error) { this.logger.error({ message: `Failed to sync label: ${payload.code}`, context: { code: payload.code, error: error.message }, }); // Don't throw — allow event to be acknowledged // Cron job will fix inconsistencies } }
@EventPattern('label.updated') async handleLabelUpdated(@Payload() event: any) { const { payload } = event;
try { await this.labelSyncService.syncLabel(payload); } catch (error) { this.logger.error({ message: `Failed to update label: ${payload.code}`, context: { code: payload.code, error: error.message }, }); } }
@EventPattern('label.deleted') async handleLabelDeleted(@Payload() event: any) { const { payload } = event;
try { await this.labelSyncService.deleteLabel(payload.code); } catch (error) { this.logger.error({ message: `Failed to delete label: ${payload.code}`, context: { code: payload.code, error: error.message }, }); } }}Step 5: Core Service — Sync Logic
Section titled “Step 5: Core Service — Sync Logic”import { Injectable } from '@nestjs/common';import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { AppDatabases } from '@lib/common/enum/app-databases.enum';import { LogsService } from '@lib/common/modules/log/logs.service';
import { LabelLookup } from '../entities/label-lookup.entity';
@Injectable()export class LabelSyncService { constructor( private readonly logger: LogsService, @InjectRepository(LabelLookup, AppDatabases.APP_CORE) private readonly labelLookupRepo: Repository<LabelLookup>, ) {}
/** * Sync label data to local read model (upsert pattern) */ async syncLabel(labelData: any): Promise<void> { const existing = await this.labelLookupRepo.findOne({ where: { code: labelData.code }, });
if (existing) { await this.labelLookupRepo.update( { code: labelData.code }, { label: labelData.label, display_name: labelData.display_name, updated_at: new Date() }, );
this.logger.log({ message: `Updated label in lookup table: ${labelData.code}`, context: { action: 'LABEL_SYNC_UPDATE', code: labelData.code }, }); } else { await this.labelLookupRepo.save({ code: labelData.code, label: labelData.label, display_name: labelData.display_name, });
this.logger.log({ message: `Inserted new label in lookup table: ${labelData.code}`, context: { action: 'LABEL_SYNC_INSERT', code: labelData.code }, }); } }
async deleteLabel(code: string): Promise<void> { await this.labelLookupRepo.delete({ code });
this.logger.log({ message: `Deleted label from lookup table: ${code}`, context: { action: 'LABEL_SYNC_DELETE', code }, }); }}Step 6: Fallback Cron Job (Daily Full Sync)
Section titled “Step 6: Fallback Cron Job (Daily Full Sync)”import { Inject, Injectable } from '@nestjs/common';import { ClientProxy } from '@nestjs/microservices';import { Cron, CronExpression } from '@nestjs/schedule';
import { AppMicroservice } from '@lib/common/enum/app-microservice.enum';import { LogsService } from '@lib/common/modules/log/logs.service';import { MicroserviceClientService } from '@lib/common/services/microservice-client.service';
import { LabelSyncService } from './label-sync.service';
/** * Cron Job for Full Sync (Fallback Strategy) * * Purpose: Recover from missed events, network issues, or service downtime * Schedule: Every day at 2 AM (low traffic period) */@Injectable()export class LabelSyncCronService { constructor( private readonly logger: LogsService, private readonly labelSyncService: LabelSyncService, private readonly microserviceClient: MicroserviceClientService, @Inject(AppMicroservice.MasterData.name) private readonly masterDataClient: ClientProxy, ) {}
@Cron(CronExpression.EVERY_DAY_AT_2AM, { name: 'full-sync-labels', timeZone: 'UTC', }) async fullSyncLabels() { const startTime = Date.now();
this.logger.log({ message: 'Starting full label sync (cron job)', context: { action: 'CRON_FULL_SYNC_START', schedule: 'Daily 2 AM UTC' }, });
try { const allLabels = await this.microserviceClient.sendWithContext( this.logger, this.masterDataClient, { cmd: AppMicroservice.MasterData.cmd.GetAllActiveLabels }, {}, [], );
for (const label of allLabels) { await this.labelSyncService.syncLabel(label); }
const duration = Date.now() - startTime;
this.logger.log({ message: `Full label sync completed`, context: { action: 'CRON_FULL_SYNC_COMPLETE', total: allLabels.length, duration_ms: duration, }, }); } catch (error) { this.logger.error({ message: 'Full label sync failed', context: { action: 'CRON_FULL_SYNC_ERROR', error: error.message }, }); } }}Key Design Decisions
Section titled “Key Design Decisions”Why code is the Primary Key (not UUID)
Section titled “Why code is the Primary Key (not UUID)”The lookup table uses code (VARCHAR) as its primary key instead of UUID:
-- ✅ Fast JOIN using code (string match, short value)SELECT r.first_name, l.labelFROM resources rLEFT JOIN labels_lookup l ON r.label_code = l.code;
-- ❌ What we'd need without code as PK:-- resources would need to store the master UUID instead of a short code-- This makes the schema harder to understand and debugBenefits of code as PK:
- Resources store a human-readable
label_codelike'MR'instead of a UUID - JOINs are on a small, indexed string — very fast
- No UUID lookup step required
Why No Soft Delete in Lookup Table
Section titled “Why No Soft Delete in Lookup Table”The master table uses deleted_at for soft deletes. The lookup table uses hard deletes triggered by the label.deleted event:
// When master service soft-deletes a label, it emits label.deleted// Core service hard-deletes from lookup tableawait this.labelLookupRepo.delete({ code });This keeps the lookup table lean — it only contains currently active records.
Related Documentation
Section titled “Related Documentation”- Database Architecture Overview — Three-database architecture and lookup table patterns
- PostgreSQL 18 Streaming Replication Setup — Physical replication for HA