Skip to content

Data Replication for Reporting

This document explains why and how we replicate master data from app_master_db to app_core_db to support enterprise reporting tools while maintaining microservices architecture principles.

The Problem: Microservices vs. Legacy Reporting Tools

Section titled “The Problem: Microservices vs. Legacy Reporting Tools”

The system has the following constraints:

  1. Separate Databases: Databases are separated following the Bounded Context principle

    • app_master_db: Stores Master Data/Reference Data (e.g., labels table)
    • app_core_db: Stores Transactional Data for resources (e.g., resources table)
  2. Reporting Tool Limitation: Jasper Reports has technical limitations

    • Requires direct SQL queries via JDBC connection
    • Cannot call HTTP APIs
    • Cannot JOIN across databases without Foreign Data Wrapper (FDW)

When generating resource reports (e.g., Resource Card), we need to display:

  • Full resource name with label (e.g., “Mr. John Smith”)
  • The resources table stores only label_code (e.g., ‘MR’, ‘MS’)
  • Full label data (label, display_name) is stored in app_master_db.labels

Problem: Reporting tools cannot JOIN across databases directly.


The Solution: Read Model Replication (CQRS Pattern)

Section titled “The Solution: Read Model Replication (CQRS Pattern)”

We replicate the labels table from app_master_db into app_core_db by creating a read-only lookup table named labels_lookup.

graph LR
    A[app_master_db<br/>labels table<br/>Source of Truth] -->|Replicate| B[app_core_db<br/>labels_lookup<br/>Read Model]
    C[Reporting Tool] -->|SQL JOIN| B
    D[resources table] -->|label_code| B

    style A fill:#FFB6C1
    style B fill:#90EE90
    style C fill:#87CEEB
-- ✅ Query that works after having labels_lookup
SELECT
r.reference_number,
CONCAT(l.label, ' ', r.first_name, ' ', r.last_name) AS full_name,
r.birth_date,
r.national_id
FROM resources r
LEFT JOIN labels_lookup l ON r.label_code = l.code
WHERE r.id = :resourceId;

Result:

reference_number | full_name | birth_date | national_id
-----------------+-------------------+-------------+-------------
REF001 | Mr. John Smith | 1985-06-15 | 1234567890
REF002 | Mrs. Jane Doe | 1990-03-22 | 9876543210

“Data Duplication is always an anti-pattern that must be avoided!”

Answer: In a Monolithic System → Yes, it’s an anti-pattern. But in Microservices → This is a standard pattern called CQRS Read Model.

✅ Valid Reasons According to DDD and Microservices Principles

Section titled “✅ Valid Reasons According to DDD and Microservices Principles”

1. CQRS Pattern (Command Query Responsibility Segregation)

Section titled “1. CQRS Pattern (Command Query Responsibility Segregation)”
  • Command Side (Write): Master Data Service is the only one that writes label data
  • Query Side (Read): Core Service has a read model (labels_lookup) for complex queries
graph TD
    A[User Request: Update Label] --> B[Master Data Service]
    B -->|Write| C[(app_master_db<br/>labels)]
    B -->|Emit Event| D[Event Bus<br/>RabbitMQ/Redis]
    D -->|Consume Event| E[Core Service]
    E -->|Update Read Model| F[(app_core_db<br/>labels_lookup)]

    G[Report Request] --> F

    style C fill:#FFB6C1
    style F fill:#90EE90
    style D fill:#FFE4B5

Master Data has special characteristics that make it suitable for replication:

CharacteristicWhy It’s Suitable for Replication
Rarely ChangesLabels don’t change often (maybe a few times per year)
Small DatasetNot much data (typically < 100 records)
Read-HeavyRead very frequently (every time a resource report is generated)
Stable SchemaTable structure doesn’t change

In a Distributed System, we must choose between:

  • C (Consistency): Data must be correct everywhere immediately
  • A (Availability): System must always be available
  • P (Partition Tolerance): Must work even if network fails

Our Decision:

Choose: Availability + Partition Tolerance
Trade-off: Eventual Consistency (accept data may not be synchronized for < 1 second)

Rationale: For Reporting/Analytics, we don’t need real-time consistency because:

  • If a label changes, showing it in reports 1-2 seconds later doesn’t affect operations
  • But if the Reporting Service crashes (not Available), we cannot generate reports at all

4. Service Autonomy (Service Independence)

Section titled “4. Service Autonomy (Service Independence)”
graph TD
    subgraph "❌ Without Read Model"
        A1[Report Request] --> B1[Core Service]
        B1 -->|API Call| C1[Master Data Service]
        C1 -->|Return Data| B1
        B1 -->|Generate Report| A1
    end

    subgraph "✅ With Read Model"
        A2[Report Request] --> B2[Core Service]
        B2 -->|SQL JOIN| C2[(labels_lookup<br/>Local)]
        C2 -->|Fast Response| B2
        B2 -->|Generate Report| A2
    end

    D[Master Data Service Down] -.->|Impact| C1
    D -.->|No Impact| C2

    style C1 fill:#FFB6C1
    style C2 fill:#90EE90

Benefits:

  • Core Service doesn’t depend on Master Data Service during report generation
  • If Master Service crashes, Core Service can still generate reports (High Availability)

Leading companies use this pattern:

CompanyUse Case
NetflixUser profile data is replicated to Recommendation Service
AmazonProduct catalog is replicated to Search Service (Elasticsearch)
UberDriver/Rider data is replicated to Analytics/Reporting DB

Section titled “Option 1: Data Replication (Read Model) ✅ RECOMMENDED”
-- Architecture
app_master_db.labels (Master)
Replicate (Event-Driven)
app_core_db.labels_lookup (Read Model)
JOIN by Reporting Tool
app_core_db.resources
ProsCons
✅ Simple SQL JOIN in reports⚠️ Eventual consistency (delay < 1 second)
✅ No runtime dependency⚠️ Increased storage (minimal for master data)
✅ Fast reads (local data)⚠️ Requires sync logic
✅ Follows CQRS pattern
✅ High availability

Summary: ✅ Suitable for Enterprise Reporting


Option 2: PostgreSQL Foreign Data Wrapper (FDW) ⚠️

Section titled “Option 2: PostgreSQL Foreign Data Wrapper (FDW) ⚠️”
-- Setup
CREATE EXTENSION postgres_fdw;
CREATE SERVER master_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'master-db-host', dbname 'app_master_db');
CREATE FOREIGN TABLE labels_remote (
code VARCHAR(10),
label VARCHAR(50),
display_name VARCHAR(50)
)
SERVER master_db
OPTIONS (schema_name 'public', table_name 'labels');
-- Query
SELECT r.*, l.label
FROM resources r
LEFT JOIN labels_remote l ON r.label_code = l.code;
ProsCons
✅ No data duplication❌ Network latency (cross-DB query)
✅ Always fresh data❌ Tight coupling (needs DB credentials)
❌ Security risk (DB-to-DB connection)
❌ Single point of failure
❌ Complex setup & monitoring

Summary: ⚠️ Not recommended for Production


-- Store full label in resources table
ALTER TABLE resources
ADD COLUMN label_data JSONB;
-- Example data
-- { "code": "MR", "label": "Mr.", "display_name": "Mister" }
-- Query
SELECT
reference_number,
label_data->>'label' AS label,
first_name
FROM resources;
ProsCons
✅ No separate table❌ Storage bloat (duplicate data in every row)
✅ Fast reads❌ Difficult to update when labels change
❌ Poor indexing performance
❌ Data inconsistency risk

Summary: ❌ Avoid for Master Data


Option 4: API Composition at Report Time ❌

Section titled “Option 4: API Composition at Report Time ❌”
// Pseudo-code
const resource = await fetchResource(resourceId);
const label = await masterDataService.getLabel(resource.label_code);
const report = generateReport({ ...resource, label });
ProsCons
✅ No data duplication❌ Jasper doesn’t support HTTP calls
✅ Always fresh data❌ Need to write custom code layer
❌ Performance overhead (N+1 queries)
❌ Runtime dependency

Summary: ❌ Not feasible with Jasper Reports


graph TD
    A[User: Update Label] --> B[Master Data Service]
    B -->|1. Save to DB| C[(app_master_db<br/>labels)]
    B -->|2. Emit Event| D[RabbitMQ/Redis<br/>Event Bus]

    D -->|3. label.created| E[Core Service<br/>Event Handler]
    D -->|3. label.updated| E
    D -->|3. label.deleted| E

    E -->|4. Sync to Read Model| F[(app_core_db<br/>labels_lookup)]

    G[Cron Job<br/>Daily 2 AM] -.->|Fallback Full Sync| F

    H[Reporting Tool] -->|5. SQL Query with JOIN| F

    style C fill:#FFB6C1
    style F fill:#90EE90
    style D fill:#FFE4B5
    style G fill:#FFA500

We use 2 strategies together:

  1. Primary: Real-Time Event-Driven Sync

    • Latency: < 1 second
    • Use Case: Normal operations
  2. Fallback: Daily Full Sync (Cron Job)

    • Schedule: Every day at 2 AM
    • Use Case: Recover from missed events, network issues

apps/data-owner-bc/src/modules/label-sync/entities/label-lookup.entity.ts
import { Column, Entity, PrimaryColumn, UpdateDateColumn } from 'typeorm';
import { AppDatabases } from '@lib/common/enum/app-databases.enum';
/**
* Read Model (Replicated from Master Data Service)
*
* Purpose: Local lookup table for reporting tools to JOIN with resources
*
* Key Differences from Master Table:
* - Read-only (no created_by, updated_by)
* - No soft delete (sync handles deletions)
* - Only essential columns needed for reporting
*/
@Entity({
name: 'labels_lookup',
database: AppDatabases.APP_CORE,
})
export class LabelLookup {
@PrimaryColumn({ type: 'varchar', length: 10, comment: 'Label code' })
code: string; // e.g., 'MR', 'MS', 'DR'
@Column({ type: 'varchar', length: 50, comment: 'Display label' })
label: string; // e.g., 'Mr.', 'Mrs.', 'Dr.'
@Column({ type: 'varchar', length: 50, comment: 'Full display name' })
display_name: string; // e.g., 'Mister', 'Mistress', 'Doctor'
@UpdateDateColumn({ type: 'timestamptz', comment: 'Last sync time' })
updated_at: Date;
}
libs/database/src/migrations/app_core_db/1759221182217-CreateLabelsLookupTable.ts
import { MigrationInterface, QueryRunner } from 'typeorm';
export class CreateLabelsLookupTable1759221182217 implements MigrationInterface {
name = 'CreateLabelsLookupTable1759221182217';
public async up(queryRunner: QueryRunner): Promise<void> {
// Step 1: Create lookup table
await queryRunner.query(`
CREATE TABLE "labels_lookup" (
"code" VARCHAR(10) PRIMARY KEY,
"label" VARCHAR(50) NOT NULL,
"display_name" VARCHAR(50) NOT NULL,
"updated_at" TIMESTAMPTZ DEFAULT NOW(),
CONSTRAINT "chk_labels_lookup_code" CHECK (LENGTH("code") > 0)
);
`);
// Step 2: Add index for fast JOINs
await queryRunner.query(`
CREATE INDEX "idx_labels_lookup_code"
ON "labels_lookup" ("code");
`);
// Step 3: Add table comment
await queryRunner.query(`
COMMENT ON TABLE "labels_lookup" IS
'Read model replicated from app_master_db.labels for reporting tools';
`);
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(`DROP INDEX IF EXISTS "idx_labels_lookup_code";`);
await queryRunner.query(`DROP TABLE IF EXISTS "labels_lookup";`);
}
}

Step 3: Master Data Service — Emit Events

Section titled “Step 3: Master Data Service — Emit Events”
apps/masterdata/src/modules/label/labels.service.ts
import { Inject, Injectable } from '@nestjs/common';
import { ClientProxy } from '@nestjs/microservices';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { AppMicroservice } from '@lib/common/enum/app-microservice.enum';
import { AppDatabases } from '@lib/common/enum/app-databases.enum';
import { LogsService } from '@lib/common/modules/log/logs.service';
import { CreateLabelDTO } from '../dto/create-label.dto';
import { UpdateLabelDTO } from '../dto/update-label.dto';
import { Label } from '../entities/label.entity';
@Injectable()
export class LabelsService {
constructor(
private readonly logger: LogsService,
@InjectRepository(Label, AppDatabases.APP_MASTER)
private readonly labelRepository: Repository<Label>,
@Inject(AppMicroservice.MasterData.name)
private readonly eventBus: ClientProxy,
) {}
async createLabel(createDTO: CreateLabelDTO, userId: string): Promise<Label> {
// 1. Save to master database
const newLabel = await this.labelRepository.save({
...createDTO,
created_by: userId,
updated_by: userId,
});
this.logger.log(`Created label: ${newLabel.code}`);
// 2. Emit domain event for other services to consume
this.eventBus.emit('label.created', {
event: 'label.created',
payload: {
id: newLabel.id,
code: newLabel.code,
label: newLabel.label,
display_name: newLabel.display_name,
},
metadata: {
timestamp: new Date().toISOString(),
source: 'masterdata-service',
userId,
},
});
return newLabel;
}
async updateLabel(code: string, updateDTO: UpdateLabelDTO, userId: string): Promise<Label> {
// 1. Update in master database
await this.labelRepository.update(
{ code },
{ ...updateDTO, updated_by: userId, updated_at: new Date() },
);
const updatedLabel = await this.labelRepository.findOne({ where: { code } });
this.logger.log(`Updated label: ${code}`);
// 2. Emit domain event
this.eventBus.emit('label.updated', {
event: 'label.updated',
payload: {
id: updatedLabel.id,
code: updatedLabel.code,
label: updatedLabel.label,
display_name: updatedLabel.display_name,
},
metadata: {
timestamp: new Date().toISOString(),
source: 'masterdata-service',
userId,
},
});
return updatedLabel;
}
async deleteLabel(code: string, userId: string): Promise<void> {
await this.labelRepository.softDelete({ code });
this.logger.log(`Deleted label: ${code}`);
this.eventBus.emit('label.deleted', {
event: 'label.deleted',
payload: { code },
metadata: {
timestamp: new Date().toISOString(),
source: 'masterdata-service',
userId,
},
});
}
async findAllActive(): Promise<Label[]> {
return this.labelRepository.find({
where: { deleted_at: null },
order: { code: 'ASC' },
});
}
}
apps/data-owner-bc/src/modules/label-sync/label-sync.controller.ts
import { Controller } from '@nestjs/common';
import { EventPattern, Payload } from '@nestjs/microservices';
import { LogsService } from '@lib/common/modules/log/logs.service';
import { LabelSyncService } from '../services/label-sync.service';
@Controller()
export class LabelSyncController {
constructor(
private readonly logger: LogsService,
private readonly labelSyncService: LabelSyncService,
) {}
@EventPattern('label.created')
async handleLabelCreated(@Payload() event: any) {
const { payload, metadata } = event;
this.logger.log({
message: `Received label.created event for code: ${payload.code}`,
context: { event: 'label.created', code: payload.code, source: metadata.source },
});
try {
await this.labelSyncService.syncLabel(payload);
this.logger.log({ message: `Synced label: ${payload.code}`, context: { code: payload.code } });
} catch (error) {
this.logger.error({
message: `Failed to sync label: ${payload.code}`,
context: { code: payload.code, error: error.message },
});
// Don't throw — allow event to be acknowledged
// Cron job will fix inconsistencies
}
}
@EventPattern('label.updated')
async handleLabelUpdated(@Payload() event: any) {
const { payload } = event;
try {
await this.labelSyncService.syncLabel(payload);
} catch (error) {
this.logger.error({
message: `Failed to update label: ${payload.code}`,
context: { code: payload.code, error: error.message },
});
}
}
@EventPattern('label.deleted')
async handleLabelDeleted(@Payload() event: any) {
const { payload } = event;
try {
await this.labelSyncService.deleteLabel(payload.code);
} catch (error) {
this.logger.error({
message: `Failed to delete label: ${payload.code}`,
context: { code: payload.code, error: error.message },
});
}
}
}
apps/data-owner-bc/src/modules/label-sync/services/label-sync.service.ts
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { AppDatabases } from '@lib/common/enum/app-databases.enum';
import { LogsService } from '@lib/common/modules/log/logs.service';
import { LabelLookup } from '../entities/label-lookup.entity';
@Injectable()
export class LabelSyncService {
constructor(
private readonly logger: LogsService,
@InjectRepository(LabelLookup, AppDatabases.APP_CORE)
private readonly labelLookupRepo: Repository<LabelLookup>,
) {}
/**
* Sync label data to local read model (upsert pattern)
*/
async syncLabel(labelData: any): Promise<void> {
const existing = await this.labelLookupRepo.findOne({
where: { code: labelData.code },
});
if (existing) {
await this.labelLookupRepo.update(
{ code: labelData.code },
{ label: labelData.label, display_name: labelData.display_name, updated_at: new Date() },
);
this.logger.log({
message: `Updated label in lookup table: ${labelData.code}`,
context: { action: 'LABEL_SYNC_UPDATE', code: labelData.code },
});
} else {
await this.labelLookupRepo.save({
code: labelData.code,
label: labelData.label,
display_name: labelData.display_name,
});
this.logger.log({
message: `Inserted new label in lookup table: ${labelData.code}`,
context: { action: 'LABEL_SYNC_INSERT', code: labelData.code },
});
}
}
async deleteLabel(code: string): Promise<void> {
await this.labelLookupRepo.delete({ code });
this.logger.log({
message: `Deleted label from lookup table: ${code}`,
context: { action: 'LABEL_SYNC_DELETE', code },
});
}
}

Step 6: Fallback Cron Job (Daily Full Sync)

Section titled “Step 6: Fallback Cron Job (Daily Full Sync)”
apps/data-owner-bc/src/modules/label-sync/services/label-sync-cron.service.ts
import { Inject, Injectable } from '@nestjs/common';
import { ClientProxy } from '@nestjs/microservices';
import { Cron, CronExpression } from '@nestjs/schedule';
import { AppMicroservice } from '@lib/common/enum/app-microservice.enum';
import { LogsService } from '@lib/common/modules/log/logs.service';
import { MicroserviceClientService } from '@lib/common/services/microservice-client.service';
import { LabelSyncService } from './label-sync.service';
/**
* Cron Job for Full Sync (Fallback Strategy)
*
* Purpose: Recover from missed events, network issues, or service downtime
* Schedule: Every day at 2 AM (low traffic period)
*/
@Injectable()
export class LabelSyncCronService {
constructor(
private readonly logger: LogsService,
private readonly labelSyncService: LabelSyncService,
private readonly microserviceClient: MicroserviceClientService,
@Inject(AppMicroservice.MasterData.name)
private readonly masterDataClient: ClientProxy,
) {}
@Cron(CronExpression.EVERY_DAY_AT_2AM, {
name: 'full-sync-labels',
timeZone: 'UTC',
})
async fullSyncLabels() {
const startTime = Date.now();
this.logger.log({
message: 'Starting full label sync (cron job)',
context: { action: 'CRON_FULL_SYNC_START', schedule: 'Daily 2 AM UTC' },
});
try {
const allLabels = await this.microserviceClient.sendWithContext(
this.logger,
this.masterDataClient,
{ cmd: AppMicroservice.MasterData.cmd.GetAllActiveLabels },
{},
[],
);
for (const label of allLabels) {
await this.labelSyncService.syncLabel(label);
}
const duration = Date.now() - startTime;
this.logger.log({
message: `Full label sync completed`,
context: {
action: 'CRON_FULL_SYNC_COMPLETE',
total: allLabels.length,
duration_ms: duration,
},
});
} catch (error) {
this.logger.error({
message: 'Full label sync failed',
context: { action: 'CRON_FULL_SYNC_ERROR', error: error.message },
});
}
}
}

The lookup table uses code (VARCHAR) as its primary key instead of UUID:

-- ✅ Fast JOIN using code (string match, short value)
SELECT r.first_name, l.label
FROM resources r
LEFT JOIN labels_lookup l ON r.label_code = l.code;
-- ❌ What we'd need without code as PK:
-- resources would need to store the master UUID instead of a short code
-- This makes the schema harder to understand and debug

Benefits of code as PK:

  • Resources store a human-readable label_code like 'MR' instead of a UUID
  • JOINs are on a small, indexed string — very fast
  • No UUID lookup step required

The master table uses deleted_at for soft deletes. The lookup table uses hard deletes triggered by the label.deleted event:

// When master service soft-deletes a label, it emits label.deleted
// Core service hard-deletes from lookup table
await this.labelLookupRepo.delete({ code });

This keeps the lookup table lean — it only contains currently active records.