Hybrid AI Architecture (RBS + RAG)

Overview

An Enterprise Information System (EIS) requires absolute certainty for critical operational decisions. While a pure Ollama + RAG system excels at explaining domain criteria, a purely generative AI approach is fundamentally unsuitable for deterministic resource allocation tasks.

This guide introduces the Hybrid AI Architecture: combining Rule-Based Systems (RBS) with RAG to ensure that critical system queries always receive precise, database-backed answers while non-critical queries benefit from contextual AI explanations.

The Core Problem: Why Pure RAG Fails for Deterministic Queries

Scenario: The Hallucination Risk

An operator asks: “Are there any available slots in Zone A?”

With Pure RAG System:

LLM Response: "Based on standard allocation guidelines, Zone A typically
has 10 slots for high-priority resources. However, I cannot verify
current availability from your system."

❌ WRONG: The LLM may:
- Hallucinate a count that doesn't match reality
- Provide outdated information from training data
- Report a slot as available when it is already occupied

With Hybrid RBS + RAG System:

Rule-Based System: "Query classified as HIGH_CONTROL_RBS.
Bypassing RAG. Executing: SELECT COUNT(*) FROM slots
WHERE status = 'available' AND zone = 'A'"

Response: 3 slots available in Zone A
✅ CORRECT: Deterministic, real-time, database-backed

Why This Architecture Matters

Scenario	Pure RAG	Hybrid RBS + RAG
Query: “Are there slots available in Zone A?”	LLM guesses based on patterns	RBS queries database directly
Risk Level	HIGH — can cause operational errors	LOW — source of truth is the database
Response Time	~500–2000ms (LLM generation)	~10–50ms (database query)
Data Accuracy	70–90% (generative)	99.99% (deterministic)
Auditability	Questionable (non-auditable)	Clear (traceable, loggable)

Core Principles: The Hybrid Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Client Request                            │
└────────────────┬────────────────────────────────────────────┘
                 │
         ┌───────▼────────┐
         │   Classifier   │ ◄─ Detects query intent
         │    Router      │
         └────┬───┬───┬───┘
             │   │   │
    ┌────────┘   │   └────────────┐
    │            │                 │
    │    ┌───────▼────────┐        │
    │    │   LOW_CONTROL  │        │
    │    │   RAG_EXPLAIN  │        │
    │    └────────┬───────┘        │
    │             │                │
    ▼             ▼                ▼
┌──────┐   ┌────────────┐   ┌──────────────┐
│ RBS  │   │ Embedding  │   │ Other Routes │
│Layer │   │ Service    │   │              │
└──┬───┘   └────┬───────┘   └──────┬───────┘
   │            │                  │
   │      ┌─────▼─────┐            │
   │      │ pgvector  │            │
   │      │ Semantic  │            │
   │      │ Search    │            │
   │      └─────┬─────┘            │
   │            │                  │
   │      ┌─────▼─────┐            │
   │      │  Ollama   │            │
   │      │  Gemma    │            │
   │      │  3:4b     │            │
   │      └─────┬─────┘            │
   │            │                  │
   └────────┬───┴──────────────────┘
            │
      ┌─────▼──────┐
      │ Formatted  │
      │  Response  │
      └─────┬──────┘
            │
      ┌─────▼──────────┐
      │ Client         │
      └────────────────┘

Core Architecture: The Rule-Based Classifier and Engine

1. The Query Classifier (Intent Detection)

The Query Classifier analyzes incoming requests and categorizes them into two paths:

HIGH_CONTROL_RBS Queries

These queries demand absolute accuracy and real-time data. The RBS layer handles them entirely, bypassing RAG completely.

Characteristics:

Asking about current system state (available slots, equipment status, queue position)
Requesting numerical facts (count, capacity, occupancy)
Checking boolean conditions (is X available? is Y active?)
No explanation or reasoning needed — just facts

Keywords that trigger HIGH_CONTROL_RBS:

available, free, status, count, occupied, allocated, assigned,
current, now, real-time, how many, are there, is there,
capacity, utilization, queue, waiting, active, inactive

LOW_CONTROL_RAG Queries

These queries benefit from contextual explanation and background reasoning. RAG is appropriate here.

Characteristics:

Asking why something (explain the slot assignment policy)
Requesting guidance (best practice for resource scheduling)
Seeking context (understanding allocation constraints)
Needs natural language explanation

Keywords that trigger LOW_CONTROL_RAG:

explain, why, how, understand, guideline, policy, procedure,
best practice, recommendation, reason, background, context,
help, advice, suggest

2. The Rule Engine Service

The Rule Engine is a deterministic, non-generative service that:

Executes predefined business rules stored in the system
Queries the operational database directly for authoritative data
Enforces allocation constraints and capacity limits
Returns structured data — not free-text responses

Example Rule Engine Components

import { Injectable } from '@nestjs/common';
import { Repository } from 'typeorm';
import { InjectRepository } from '@nestjs/typeorm';

import { Slot } from '../entities/slot.entity';
import { Equipment } from '../entities/equipment.entity';
import { AppDatabases } from '@lib/common/enum/app-databases.enum';

/**
 * Rule Engine Service
 *
 * Purpose: Single source of truth for all deterministic resource queries.
 * This service NEVER uses AI/LLM. It only executes defined business rules.
 */
@Injectable()
export class RuleEngineService {
    constructor(
        @InjectRepository(Slot, AppDatabases.APP_CORE)
        private readonly slotRepository: Repository<Slot>,
        @InjectRepository(Equipment, AppDatabases.APP_CORE)
        private readonly equipmentRepository: Repository<Equipment>,
    ) {}

    /**
     * Rule: Get Available Slots in Zone
     * Input: Zone name
     * Output: Count of available slots, slot IDs, occupancy rate
     * Returns: Structured data (NOT prose)
     */
    async getAvailableSlots(zoneName: string): Promise<{
        count: number;
        slotIds: string[];
        totalCapacity: number;
        occupancyRate: number; // percentage
    }> {
        // Step 1: Query database directly (no AI involved)
        const [slots, totalCount] = await this.slotRepository.findAndCount({
            where: {
                zone: zoneName,
                status: 'available',
                deleted_at: null,
            },
        });

        // Step 2: Calculate metrics (deterministic math)
        const slotIds = slots.map((slot) => slot.id);
        const occupancyRate = ((totalCount - slots.length) / totalCount) * 100;

        // Step 3: Return structured response
        return {
            count: slots.length,
            slotIds,
            totalCapacity: totalCount,
            occupancyRate: Math.round(occupancyRate),
        };
    }

    /**
     * Rule: Get Equipment Status
     * Input: Equipment type, Zone (optional)
     * Output: Available count, Status breakdown, Location details
     */
    async getEquipmentStatus(
        equipmentType: string,
        zoneName?: string,
    ): Promise<{
        available: number;
        inUse: number;
        maintenance: number;
        total: number;
        statusBreakdown: Record<string, number>;
        locations: Array<{ zoneName: string; count: number }>;
    }> {
        const query = this.equipmentRepository
            .createQueryBuilder('equipment')
            .where('equipment.equipment_type = :equipmentType', { equipmentType });

        if (zoneName) {
            query.andWhere('equipment.zone = :zoneName', { zoneName });
        }

        const equipment = await query.getMany();

        const statusBreakdown = equipment.reduce(
            (acc, e) => {
                acc[e.status] = (acc[e.status] || 0) + 1;
                return acc;
            },
            {} as Record<string, number>,
        );

        const locations = equipment.reduce(
            (acc, e) => {
                const existing = acc.find((l) => l.zoneName === e.zone);
                if (existing) {
                    existing.count++;
                } else {
                    acc.push({ zoneName: e.zone, count: 1 });
                }
                return acc;
            },
            [] as Array<{ zoneName: string; count: number }>,
        );

        return {
            available: statusBreakdown['available'] || 0,
            inUse: statusBreakdown['in_use'] || 0,
            maintenance: statusBreakdown['maintenance'] || 0,
            total: equipment.length,
            statusBreakdown,
            locations,
        };
    }

    /**
     * Rule: Check Allocation Policy Constraints
     * Input: Resource priority level, Resource type
     * Output: Allowed/Denied, Reason code (for audit)
     */
    async validateAllocationPolicy(
        resourceLevel: 'critical' | 'high' | 'medium' | 'low',
        resourceType: 'zone_a_slot' | 'standard_slot' | 'specialized_equipment',
    ): Promise<{
        allowed: boolean;
        reasonCode: string;
        message: string;
    }> {
        // Define allocation rules (business logic, NOT AI)
        const rules: Record<string, Record<string, boolean>> = {
            critical: {
                zone_a_slot: true,
                standard_slot: true,
                specialized_equipment: true,
            },
            high: {
                zone_a_slot: false,
                standard_slot: true,
                specialized_equipment: false,
            },
            medium: {
                zone_a_slot: false,
                standard_slot: true,
                specialized_equipment: false,
            },
            low: {
                zone_a_slot: false,
                standard_slot: true,
                specialized_equipment: false,
            },
        };

        const allowed = rules[resourceLevel]?.[resourceType] || false;

        return {
            allowed,
            reasonCode: allowed ? 'ALLOCATION_ALLOWED' : 'ALLOCATION_DENIED',
            message: allowed
                ? `Level '${resourceLevel}' can be allocated '${resourceType}'`
                : `Level '${resourceLevel}' cannot be allocated '${resourceType}'. Policy violation.`,
        };
    }

    /**
     * Rule: Calculate Optimal Slot Assignment
     * Input: Resource ID, Available slots
     * Output: Recommended slot ID, Score, Rationale code
     *
     * Uses deterministic scoring — NOT ML prediction
     */
    async calculateOptimalSlotAssignment(
        resourceId: string,
        availableSlotIds: string[],
    ): Promise<{
        recommendedSlotId: string | null;
        score: number; // 0-100
        rationaleCode: string;
    }> {
        if (availableSlotIds.length === 0) {
            return {
                recommendedSlotId: null,
                score: 0,
                rationaleCode: 'NO_AVAILABLE_SLOTS',
            };
        }

        const slotScores = await Promise.all(
            availableSlotIds.map(async (slotId) => {
                const slot = await this.slotRepository.findOne({ where: { id: slotId } });
                if (slot === null) return { slotId, score: 0 };

                // Scoring criteria (deterministic, not ML):
                // 1. Proximity to intake zone (+20)
                // 2. Recent usage pattern (+15)
                // 3. Compatible equipment in zone (+25)
                // 4. Maintenance status (+20)
                // 5. Privacy/isolation level match (+20)

                let score = 0;
                score += slot.proximity_score || 0;
                score += slot.availability_score || 0;
                score += slot.equipment_compatibility || 0;
                score += slot.maintenance_score || 0;
                score += slot.isolation_match_score || 0;

                return { slotId, score };
            }),
        );

        const best = slotScores.reduce((max, curr) => (curr.score > max.score ? curr : max));

        return {
            recommendedSlotId: best.slotId,
            score: best.score,
            rationaleCode: 'OPTIMAL_ASSIGNMENT',
        };
    }
}

Classifier Implementation: The Intent Router

Query Classifier Service

import { Injectable } from '@nestjs/common';
import { LogsService } from '@lib/common/modules/log/logs.service';

export type QueryControlLevel = 'HIGH_CONTROL_RBS' | 'LOW_CONTROL_RAG';

/**
 * Query Classifier Service
 *
 * Determines whether a query should be handled by:
 * - HIGH_CONTROL_RBS: Rule-Based System (deterministic, database-backed)
 * - LOW_CONTROL_RAG: RAG System (explanatory, contextual)
 *
 * This is the CRITICAL entry point for hybrid AI safety.
 */
@Injectable()
export class QueryClassifierService {
    private readonly highControlKeywords = [
        // State queries
        'available', 'free', 'occupied', 'status',
        // Count queries
        'count', 'how many', 'how much', 'number of',
        // Existence queries
        'are there', 'is there', 'is', 'are',
        // Real-time queries
        'current', 'now', 'real-time', 'this moment',
        // Allocation queries
        'allocated', 'assigned', 'location of', 'where is',
        // Queue queries
        'queue', 'waiting', 'next in line', 'waiting list',
    ];

    private readonly lowControlKeywords = [
        'explain', 'why', 'how', 'understand',
        'guideline', 'policy', 'procedure', 'best practice',
        'recommendation', 'reason', 'background',
        'help', 'advice', 'suggest', 'should',
    ];

    constructor(private logger: LogsService) {}

    /**
     * Main classification method.
     * Returns the control level for a given query.
     */
    classify(query: string): QueryControlLevel {
        const lowerQuery = query.toLowerCase().trim();

        // Step 1: Detect HIGH_CONTROL_RBS keywords (take priority)
        const hasHighControl = this.highControlKeywords.some((keyword) =>
            lowerQuery.includes(keyword),
        );

        if (hasHighControl) {
            this.logger.log({ action: 'QUERY_CLASSIFIED', controlLevel: 'HIGH_CONTROL_RBS' });
            return 'HIGH_CONTROL_RBS';
        }

        // Step 2: Detect LOW_CONTROL_RAG keywords
        const hasLowControl = this.lowControlKeywords.some((keyword) =>
            lowerQuery.includes(keyword),
        );

        if (hasLowControl) {
            this.logger.log({ action: 'QUERY_CLASSIFIED', controlLevel: 'LOW_CONTROL_RAG' });
            return 'LOW_CONTROL_RAG';
        }

        // Step 3: Default to HIGH_CONTROL_RBS (fail-safe for ambiguous queries)
        // Better to be over-cautious and use RBS than risk hallucination.
        this.logger.log({ action: 'QUERY_CLASSIFIED', controlLevel: 'HIGH_CONTROL_RBS', reason: 'default_to_rbs' });
        return 'HIGH_CONTROL_RBS';
    }

    /**
     * Returns a confidence score (0-100) for a classification result.
     */
    getConfidence(query: string, controlLevel: QueryControlLevel): number {
        const lowerQuery = query.toLowerCase();
        const keywords =
            controlLevel === 'HIGH_CONTROL_RBS'
                ? this.highControlKeywords
                : this.lowControlKeywords;

        const matches = keywords.filter((kw) => lowerQuery.includes(kw)).length;
        return Math.min(100, matches * 25);
    }
}

RBS Workflow: The Orchestrator

Resource Status Service

import { Injectable } from '@nestjs/common';
import { LogsService } from '@lib/common/modules/log/logs.service';
import { QueryClassifierService } from './query-classifier.service';
import { RuleEngineService } from './rule-engine.service';
import { AssessmentExplainersService } from '../../../modules/assessment/services/assessment-explainers.service';

interface ResourceStatusResponse {
    controlLevel: 'HIGH_CONTROL_RBS' | 'LOW_CONTROL_RAG';
    query: string;
    response: unknown;
    source: 'RBS' | 'RAG' | 'HYBRID';
    timestamp: Date;
    processingTimeMs: number;
}

/**
 * Resource Status Service
 *
 * Orchestrates the hybrid approach:
 * 1. Classify query intent
 * 2. Route to appropriate system (RBS or RAG)
 * 3. Return typed response
 */
@Injectable()
export class ResourceStatusService {
    constructor(
        private classifier: QueryClassifierService,
        private ruleEngine: RuleEngineService,
        private ragExplainer: AssessmentExplainersService,
        private logger: LogsService,
    ) {}

    async queryResourceStatus(userQuery: string): Promise<ResourceStatusResponse> {
        const startTime = Date.now();

        // STEP 1: Classify the query
        const controlLevel = this.classifier.classify(userQuery);
        const confidence = this.classifier.getConfidence(userQuery, controlLevel);

        this.logger.log({ action: 'RESOURCE_QUERY_START', controlLevel, confidence });

        // STEP 2: Route based on control level
        let response: unknown;
        let source: 'RBS' | 'RAG' | 'HYBRID';

        if (controlLevel === 'HIGH_CONTROL_RBS') {
            // HIGH CONTROL: Use RBS only — no AI/LLM involved
            response = await this.executeRbsQuery(userQuery);
            source = 'RBS';
        } else {
            // LOW CONTROL: Use RAG for explanation
            response = await this.ragExplainer.explain(userQuery);
            source = 'RAG';
        }

        const processingTimeMs = Date.now() - startTime;
        this.logger.log({ action: 'RESOURCE_QUERY_COMPLETE', controlLevel, source, processingTimeMs });

        return {
            controlLevel,
            query: userQuery,
            response,
            source,
            timestamp: new Date(),
            processingTimeMs,
        };
    }

    /**
     * Execute RBS query (deterministic path).
     * Parses query intent and calls the appropriate rule engine method.
     */
    private async executeRbsQuery(userQuery: string): Promise<unknown> {
        const query = userQuery.toLowerCase();

        if (query.includes('available') && query.includes('slot')) {
            const zoneMatch = userQuery.match(/zone\s+(\w+)/i);
            const zoneName = zoneMatch ? zoneMatch[1] : 'general';
            return this.ruleEngine.getAvailableSlots(zoneName);
        }

        if (query.includes('equipment') && query.includes('status')) {
            const typeMatch = userQuery.match(/equipment\s+(\w+)/i);
            const equipmentType = typeMatch ? typeMatch[1] : 'standard';
            return this.ruleEngine.getEquipmentStatus(equipmentType);
        }

        if (query.includes('allocat') && query.includes('policy')) {
            return {
                message: 'Use a specific allocation query with resource level',
                example: 'Can we allocate Zone A slot to a critical resource?',
            };
        }

        return {
            error: 'QUERY_NOT_RECOGNIZED',
            message: 'This RBS query could not be parsed. Please rephrase.',
            hint: 'Try: "How many slots are available in Zone A?" or "What is the status of specialized equipment?"',
        };
    }
}

API Endpoints

Resource Status Endpoint

@ResourceType('resource-status')
@ApiTags('Resource Management')
@Controller('resources')
@UseGuards(AuthGuard)
export class ResourceStatusController {
    constructor(
        private resourceStatusService: ResourceStatusService,
        private ruleEngine: RuleEngineService,
    ) {}

    /**
     * Query Resources (Hybrid AI Endpoint)
     *
     * Intelligent routing:
     * - "How many slots are available in Zone A?" → RBS (3 slots)
     * - "Explain the slot allocation policy" → RAG (contextual explanation)
     */
    @Get('status')
    @RequirePermission('resource:query')
    async queryResourceStatus(@Query('query') query: string) {
        return this.resourceStatusService.queryResourceStatus(query);
    }

    /**
     * Get Available Slots (Direct RBS — maximum performance)
     */
    @Get('slots/available')
    @RequirePermission('resource:query')
    async getAvailableSlots(@Query('zone') zoneName: string = 'general') {
        return this.ruleEngine.getAvailableSlots(zoneName);
    }

    /**
     * Get Equipment Status (Direct RBS)
     */
    @Get('equipment/status')
    @RequirePermission('resource:query')
    async getEquipmentStatus(
        @Query('type') equipmentType: string,
        @Query('zone') zoneName?: string,
    ) {
        return this.ruleEngine.getEquipmentStatus(equipmentType, zoneName);
    }
}

Example API response (HIGH_CONTROL_RBS):

{
    "status": { "code": 200000, "message": "Request Succeeded" },
    "data": {
        "type": "resource-status",
        "attributes": {
            "control_level": "HIGH_CONTROL_RBS",
            "query": "How many slots are available in Zone A?",
            "response": {
                "count": 3,
                "slot_ids": ["slot-101", "slot-102", "slot-103"],
                "total_capacity": 10,
                "occupancy_rate": 70
            },
            "source": "RBS",
            "processing_time_ms": 25
        }
    }
}

Complete Request Flow: Sequence Diagram

This diagram shows the critical difference between HIGH_CONTROL_RBS queries (which bypass all AI) and LOW_CONTROL_RAG queries:

sequenceDiagram
    autonumber
    actor User as Operations Staff
    participant API as API Gateway
    participant Auth as AuthGuard
    participant Classifier as Query Classifier
    participant RBS as Rule Engine (RBS)
    participant DB as PostgreSQL (Core)
    participant RAG as RAG Explainer
    participant Embed as Embedding Service
    participant Ollama as Ollama (LLM)
    participant Transform as Transform Interceptor

    Note over User,Transform: SCENARIO 1 — HIGH_CONTROL_RBS Query

    User->>+API: GET /resources/status?query=How many slots in Zone A?
    API->>+Auth: Validate JWT & Permissions
    Auth-->>-API: Auth OK
    API->>+Classifier: classify("How many slots in Zone A?")
    Classifier->>Classifier: Detect keywords: "how many", "slots" → HIGH_CONTROL
    Classifier-->>-API: HIGH_CONTROL_RBS

    Note over Classifier,DB: CRITICAL: AI/LLM is BYPASSED — query goes directly to database

    API->>+RBS: executeRbsQuery(...)
    RBS->>+DB: SELECT COUNT(*) FROM slots WHERE zone='A' AND status='available'
    DB-->>-RBS: 3 available slots
    RBS-->>-API: { count: 3, slotIds: [...], totalCapacity: 10 }

    API->>Transform: Format response
    Transform-->>API: JSON:API formatted response
    API-->>User: HTTP 200 { "response": { "count": 3, ... }, "source": "RBS" }

    Note over User: Staff receives EXACT data from database — NOT AI-generated

    Note over User,Transform: SCENARIO 2 — LOW_CONTROL_RAG Query

    User->>+API: GET /resources/status?query=Explain slot allocation policy
    API->>+Auth: Validate JWT & Permissions
    Auth-->>-API: Auth OK
    API->>+Classifier: classify("Explain slot allocation policy")
    Classifier->>Classifier: Detect keywords: "explain", "policy" → LOW_CONTROL
    Classifier-->>-API: LOW_CONTROL_RAG

    Note over Classifier,Ollama: This query is SUITABLE for AI — no critical data at risk

    API->>+RAG: explain("Explain slot allocation policy")
    RAG->>+Embed: getEmbedding("slot allocation policy")
    Embed-->>-RAG: [0.234, -0.567, ...]
    RAG->>DB: SELECT content FROM guidelines WHERE embedding <=> $1 LIMIT 3
    DB-->>RAG: Similar documents found
    RAG->>+Ollama: POST /v1/chat/completions (prompt with context)
    Ollama-->>-RAG: "The slot allocation policy prioritizes..."
    RAG-->>-API: Explanation text

    API->>Transform: Format response
    Transform-->>API: JSON:API formatted response
    API-->>User: HTTP 200 { "response": "The policy prioritizes...", "source": "RAG" }

    Note over User: Staff receives AI-generated explanation — suitable for guidance

    Note over RBS,DB: RBS Path: ~25–50ms | Deterministic, no hallucination
    Note over RAG,Ollama: RAG Path: ~500–2000ms | AI-enhanced, for non-critical queries

Error Handling and Fallback Strategies

Key Principles

Failure	Behavior
RBS database timeout	Throw `ServiceUnavailableException` — never fall back to AI for HIGH_CONTROL queries
Embedding service down	Fall back to keyword search (if implemented), or return `503`
Ollama server down	Return `503` with message “Explanation service temporarily unavailable”
Unrecognized query	Default to `HIGH_CONTROL_RBS` (fail-safe — better cautious than hallucinating)

// In ResourceStatusService: never silently fall back from RBS to RAG
if (controlLevel === 'HIGH_CONTROL_RBS') {
    // If this throws, let it propagate — do NOT catch and reroute to RAG
    response = await this.executeRbsQuery(userQuery);
    source = 'RBS';
}

AI Integration (Ollama + RAG) — Full RAG pipeline with pgvector and Ollama
Rule-Based Decision Engine — IF-THEN rule engine with safety validation and inventory checks