Hybrid AI Architecture (RBS + RAG)
Overview
Section titled βOverviewβAn Enterprise Information System (EIS) requires absolute certainty for critical operational decisions. While a pure Ollama + RAG system excels at explaining domain criteria, a purely generative AI approach is fundamentally unsuitable for deterministic resource allocation tasks.
This guide introduces the Hybrid AI Architecture: combining Rule-Based Systems (RBS) with RAG to ensure that critical system queries always receive precise, database-backed answers while non-critical queries benefit from contextual AI explanations.
The Core Problem: Why Pure RAG Fails for Deterministic Queries
Section titled βThe Core Problem: Why Pure RAG Fails for Deterministic QueriesβScenario: The Hallucination Risk
Section titled βScenario: The Hallucination RiskβAn operator asks: βAre there any available slots in Zone A?β
With Pure RAG System:
LLM Response: "Based on standard allocation guidelines, Zone A typicallyhas 10 slots for high-priority resources. However, I cannot verifycurrent availability from your system."
β WRONG: The LLM may:- Hallucinate a count that doesn't match reality- Provide outdated information from training data- Report a slot as available when it is already occupiedWith Hybrid RBS + RAG System:
Rule-Based System: "Query classified as HIGH_CONTROL_RBS.Bypassing RAG. Executing: SELECT COUNT(*) FROM slotsWHERE status = 'available' AND zone = 'A'"
Response: 3 slots available in Zone Aβ
CORRECT: Deterministic, real-time, database-backedWhy This Architecture Matters
Section titled βWhy This Architecture Mattersβ| Scenario | Pure RAG | Hybrid RBS + RAG |
|---|---|---|
| Query: βAre there slots available in Zone A?β | LLM guesses based on patterns | RBS queries database directly |
| Risk Level | HIGH β can cause operational errors | LOW β source of truth is the database |
| Response Time | ~500β2000ms (LLM generation) | ~10β50ms (database query) |
| Data Accuracy | 70β90% (generative) | 99.99% (deterministic) |
| Auditability | Questionable (non-auditable) | Clear (traceable, loggable) |
Core Principles: The Hybrid Architecture
Section titled βCore Principles: The Hybrid Architectureβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Client Request βββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ β βββββββββΌβββββββββ β Classifier β ββ Detects query intent β Router β ββββββ¬ββββ¬ββββ¬ββββ β β β ββββββββββ β ββββββββββββββ β β β β βββββββββΌβββββββββ β β β LOW_CONTROL β β β β RAG_EXPLAIN β β β ββββββββββ¬ββββββββ β β β β βΌ βΌ βΌββββββββ ββββββββββββββ βββββββββββββββββ RBS β β Embedding β β Other Routes ββLayer β β Service β β βββββ¬ββββ ββββββ¬ββββββββ ββββββββ¬ββββββββ β β β β βββββββΌββββββ β β β pgvector β β β β Semantic β β β β Search β β β βββββββ¬ββββββ β β β β β βββββββΌββββββ β β β Ollama β β β β Gemma β β β β 3:4b β β β βββββββ¬ββββββ β β β β ββββββββββ¬ββββ΄βββββββββββββββββββ β βββββββΌβββββββ β Formatted β β Response β βββββββ¬βββββββ β βββββββΌβββββββββββ β Client β ββββββββββββββββββCore Architecture: The Rule-Based Classifier and Engine
Section titled βCore Architecture: The Rule-Based Classifier and Engineβ1. The Query Classifier (Intent Detection)
Section titled β1. The Query Classifier (Intent Detection)βThe Query Classifier analyzes incoming requests and categorizes them into two paths:
HIGH_CONTROL_RBS Queries
Section titled βHIGH_CONTROL_RBS QueriesβThese queries demand absolute accuracy and real-time data. The RBS layer handles them entirely, bypassing RAG completely.
Characteristics:
- Asking about current system state (available slots, equipment status, queue position)
- Requesting numerical facts (count, capacity, occupancy)
- Checking boolean conditions (is X available? is Y active?)
- No explanation or reasoning needed β just facts
Keywords that trigger HIGH_CONTROL_RBS:
available, free, status, count, occupied, allocated, assigned,current, now, real-time, how many, are there, is there,capacity, utilization, queue, waiting, active, inactiveLOW_CONTROL_RAG Queries
Section titled βLOW_CONTROL_RAG QueriesβThese queries benefit from contextual explanation and background reasoning. RAG is appropriate here.
Characteristics:
- Asking why something (explain the slot assignment policy)
- Requesting guidance (best practice for resource scheduling)
- Seeking context (understanding allocation constraints)
- Needs natural language explanation
Keywords that trigger LOW_CONTROL_RAG:
explain, why, how, understand, guideline, policy, procedure,best practice, recommendation, reason, background, context,help, advice, suggest2. The Rule Engine Service
Section titled β2. The Rule Engine ServiceβThe Rule Engine is a deterministic, non-generative service that:
- Executes predefined business rules stored in the system
- Queries the operational database directly for authoritative data
- Enforces allocation constraints and capacity limits
- Returns structured data β not free-text responses
Example Rule Engine Components
Section titled βExample Rule Engine Componentsβimport { Injectable } from '@nestjs/common';import { Repository } from 'typeorm';import { InjectRepository } from '@nestjs/typeorm';
import { Slot } from '../entities/slot.entity';import { Equipment } from '../entities/equipment.entity';import { AppDatabases } from '@lib/common/enum/app-databases.enum';
/** * Rule Engine Service * * Purpose: Single source of truth for all deterministic resource queries. * This service NEVER uses AI/LLM. It only executes defined business rules. */@Injectable()export class RuleEngineService { constructor( @InjectRepository(Slot, AppDatabases.APP_CORE) private readonly slotRepository: Repository<Slot>, @InjectRepository(Equipment, AppDatabases.APP_CORE) private readonly equipmentRepository: Repository<Equipment>, ) {}
/** * Rule: Get Available Slots in Zone * Input: Zone name * Output: Count of available slots, slot IDs, occupancy rate * Returns: Structured data (NOT prose) */ async getAvailableSlots(zoneName: string): Promise<{ count: number; slotIds: string[]; totalCapacity: number; occupancyRate: number; // percentage }> { // Step 1: Query database directly (no AI involved) const [slots, totalCount] = await this.slotRepository.findAndCount({ where: { zone: zoneName, status: 'available', deleted_at: null, }, });
// Step 2: Calculate metrics (deterministic math) const slotIds = slots.map((slot) => slot.id); const occupancyRate = ((totalCount - slots.length) / totalCount) * 100;
// Step 3: Return structured response return { count: slots.length, slotIds, totalCapacity: totalCount, occupancyRate: Math.round(occupancyRate), }; }
/** * Rule: Get Equipment Status * Input: Equipment type, Zone (optional) * Output: Available count, Status breakdown, Location details */ async getEquipmentStatus( equipmentType: string, zoneName?: string, ): Promise<{ available: number; inUse: number; maintenance: number; total: number; statusBreakdown: Record<string, number>; locations: Array<{ zoneName: string; count: number }>; }> { const query = this.equipmentRepository .createQueryBuilder('equipment') .where('equipment.equipment_type = :equipmentType', { equipmentType });
if (zoneName) { query.andWhere('equipment.zone = :zoneName', { zoneName }); }
const equipment = await query.getMany();
const statusBreakdown = equipment.reduce( (acc, e) => { acc[e.status] = (acc[e.status] || 0) + 1; return acc; }, {} as Record<string, number>, );
const locations = equipment.reduce( (acc, e) => { const existing = acc.find((l) => l.zoneName === e.zone); if (existing) { existing.count++; } else { acc.push({ zoneName: e.zone, count: 1 }); } return acc; }, [] as Array<{ zoneName: string; count: number }>, );
return { available: statusBreakdown['available'] || 0, inUse: statusBreakdown['in_use'] || 0, maintenance: statusBreakdown['maintenance'] || 0, total: equipment.length, statusBreakdown, locations, }; }
/** * Rule: Check Allocation Policy Constraints * Input: Resource priority level, Resource type * Output: Allowed/Denied, Reason code (for audit) */ async validateAllocationPolicy( resourceLevel: 'critical' | 'high' | 'medium' | 'low', resourceType: 'zone_a_slot' | 'standard_slot' | 'specialized_equipment', ): Promise<{ allowed: boolean; reasonCode: string; message: string; }> { // Define allocation rules (business logic, NOT AI) const rules: Record<string, Record<string, boolean>> = { critical: { zone_a_slot: true, standard_slot: true, specialized_equipment: true, }, high: { zone_a_slot: false, standard_slot: true, specialized_equipment: false, }, medium: { zone_a_slot: false, standard_slot: true, specialized_equipment: false, }, low: { zone_a_slot: false, standard_slot: true, specialized_equipment: false, }, };
const allowed = rules[resourceLevel]?.[resourceType] || false;
return { allowed, reasonCode: allowed ? 'ALLOCATION_ALLOWED' : 'ALLOCATION_DENIED', message: allowed ? `Level '${resourceLevel}' can be allocated '${resourceType}'` : `Level '${resourceLevel}' cannot be allocated '${resourceType}'. Policy violation.`, }; }
/** * Rule: Calculate Optimal Slot Assignment * Input: Resource ID, Available slots * Output: Recommended slot ID, Score, Rationale code * * Uses deterministic scoring β NOT ML prediction */ async calculateOptimalSlotAssignment( resourceId: string, availableSlotIds: string[], ): Promise<{ recommendedSlotId: string | null; score: number; // 0-100 rationaleCode: string; }> { if (availableSlotIds.length === 0) { return { recommendedSlotId: null, score: 0, rationaleCode: 'NO_AVAILABLE_SLOTS', }; }
const slotScores = await Promise.all( availableSlotIds.map(async (slotId) => { const slot = await this.slotRepository.findOne({ where: { id: slotId } }); if (slot === null) return { slotId, score: 0 };
// Scoring criteria (deterministic, not ML): // 1. Proximity to intake zone (+20) // 2. Recent usage pattern (+15) // 3. Compatible equipment in zone (+25) // 4. Maintenance status (+20) // 5. Privacy/isolation level match (+20)
let score = 0; score += slot.proximity_score || 0; score += slot.availability_score || 0; score += slot.equipment_compatibility || 0; score += slot.maintenance_score || 0; score += slot.isolation_match_score || 0;
return { slotId, score }; }), );
const best = slotScores.reduce((max, curr) => (curr.score > max.score ? curr : max));
return { recommendedSlotId: best.slotId, score: best.score, rationaleCode: 'OPTIMAL_ASSIGNMENT', }; }}Classifier Implementation: The Intent Router
Section titled βClassifier Implementation: The Intent RouterβQuery Classifier Service
Section titled βQuery Classifier Serviceβimport { Injectable } from '@nestjs/common';import { LogsService } from '@lib/common/modules/log/logs.service';
export type QueryControlLevel = 'HIGH_CONTROL_RBS' | 'LOW_CONTROL_RAG';
/** * Query Classifier Service * * Determines whether a query should be handled by: * - HIGH_CONTROL_RBS: Rule-Based System (deterministic, database-backed) * - LOW_CONTROL_RAG: RAG System (explanatory, contextual) * * This is the CRITICAL entry point for hybrid AI safety. */@Injectable()export class QueryClassifierService { private readonly highControlKeywords = [ // State queries 'available', 'free', 'occupied', 'status', // Count queries 'count', 'how many', 'how much', 'number of', // Existence queries 'are there', 'is there', 'is', 'are', // Real-time queries 'current', 'now', 'real-time', 'this moment', // Allocation queries 'allocated', 'assigned', 'location of', 'where is', // Queue queries 'queue', 'waiting', 'next in line', 'waiting list', ];
private readonly lowControlKeywords = [ 'explain', 'why', 'how', 'understand', 'guideline', 'policy', 'procedure', 'best practice', 'recommendation', 'reason', 'background', 'help', 'advice', 'suggest', 'should', ];
constructor(private logger: LogsService) {}
/** * Main classification method. * Returns the control level for a given query. */ classify(query: string): QueryControlLevel { const lowerQuery = query.toLowerCase().trim();
// Step 1: Detect HIGH_CONTROL_RBS keywords (take priority) const hasHighControl = this.highControlKeywords.some((keyword) => lowerQuery.includes(keyword), );
if (hasHighControl) { this.logger.log({ action: 'QUERY_CLASSIFIED', controlLevel: 'HIGH_CONTROL_RBS' }); return 'HIGH_CONTROL_RBS'; }
// Step 2: Detect LOW_CONTROL_RAG keywords const hasLowControl = this.lowControlKeywords.some((keyword) => lowerQuery.includes(keyword), );
if (hasLowControl) { this.logger.log({ action: 'QUERY_CLASSIFIED', controlLevel: 'LOW_CONTROL_RAG' }); return 'LOW_CONTROL_RAG'; }
// Step 3: Default to HIGH_CONTROL_RBS (fail-safe for ambiguous queries) // Better to be over-cautious and use RBS than risk hallucination. this.logger.log({ action: 'QUERY_CLASSIFIED', controlLevel: 'HIGH_CONTROL_RBS', reason: 'default_to_rbs' }); return 'HIGH_CONTROL_RBS'; }
/** * Returns a confidence score (0-100) for a classification result. */ getConfidence(query: string, controlLevel: QueryControlLevel): number { const lowerQuery = query.toLowerCase(); const keywords = controlLevel === 'HIGH_CONTROL_RBS' ? this.highControlKeywords : this.lowControlKeywords;
const matches = keywords.filter((kw) => lowerQuery.includes(kw)).length; return Math.min(100, matches * 25); }}RBS Workflow: The Orchestrator
Section titled βRBS Workflow: The OrchestratorβResource Status Service
Section titled βResource Status Serviceβimport { Injectable } from '@nestjs/common';import { LogsService } from '@lib/common/modules/log/logs.service';import { QueryClassifierService } from './query-classifier.service';import { RuleEngineService } from './rule-engine.service';import { AssessmentExplainersService } from '../../../modules/assessment/services/assessment-explainers.service';
interface ResourceStatusResponse { controlLevel: 'HIGH_CONTROL_RBS' | 'LOW_CONTROL_RAG'; query: string; response: unknown; source: 'RBS' | 'RAG' | 'HYBRID'; timestamp: Date; processingTimeMs: number;}
/** * Resource Status Service * * Orchestrates the hybrid approach: * 1. Classify query intent * 2. Route to appropriate system (RBS or RAG) * 3. Return typed response */@Injectable()export class ResourceStatusService { constructor( private classifier: QueryClassifierService, private ruleEngine: RuleEngineService, private ragExplainer: AssessmentExplainersService, private logger: LogsService, ) {}
async queryResourceStatus(userQuery: string): Promise<ResourceStatusResponse> { const startTime = Date.now();
// STEP 1: Classify the query const controlLevel = this.classifier.classify(userQuery); const confidence = this.classifier.getConfidence(userQuery, controlLevel);
this.logger.log({ action: 'RESOURCE_QUERY_START', controlLevel, confidence });
// STEP 2: Route based on control level let response: unknown; let source: 'RBS' | 'RAG' | 'HYBRID';
if (controlLevel === 'HIGH_CONTROL_RBS') { // HIGH CONTROL: Use RBS only β no AI/LLM involved response = await this.executeRbsQuery(userQuery); source = 'RBS'; } else { // LOW CONTROL: Use RAG for explanation response = await this.ragExplainer.explain(userQuery); source = 'RAG'; }
const processingTimeMs = Date.now() - startTime; this.logger.log({ action: 'RESOURCE_QUERY_COMPLETE', controlLevel, source, processingTimeMs });
return { controlLevel, query: userQuery, response, source, timestamp: new Date(), processingTimeMs, }; }
/** * Execute RBS query (deterministic path). * Parses query intent and calls the appropriate rule engine method. */ private async executeRbsQuery(userQuery: string): Promise<unknown> { const query = userQuery.toLowerCase();
if (query.includes('available') && query.includes('slot')) { const zoneMatch = userQuery.match(/zone\s+(\w+)/i); const zoneName = zoneMatch ? zoneMatch[1] : 'general'; return this.ruleEngine.getAvailableSlots(zoneName); }
if (query.includes('equipment') && query.includes('status')) { const typeMatch = userQuery.match(/equipment\s+(\w+)/i); const equipmentType = typeMatch ? typeMatch[1] : 'standard'; return this.ruleEngine.getEquipmentStatus(equipmentType); }
if (query.includes('allocat') && query.includes('policy')) { return { message: 'Use a specific allocation query with resource level', example: 'Can we allocate Zone A slot to a critical resource?', }; }
return { error: 'QUERY_NOT_RECOGNIZED', message: 'This RBS query could not be parsed. Please rephrase.', hint: 'Try: "How many slots are available in Zone A?" or "What is the status of specialized equipment?"', }; }}API Endpoints
Section titled βAPI EndpointsβResource Status Endpoint
Section titled βResource Status Endpointβ@ResourceType('resource-status')@ApiTags('Resource Management')@Controller('resources')@UseGuards(AuthGuard)export class ResourceStatusController { constructor( private resourceStatusService: ResourceStatusService, private ruleEngine: RuleEngineService, ) {}
/** * Query Resources (Hybrid AI Endpoint) * * Intelligent routing: * - "How many slots are available in Zone A?" β RBS (3 slots) * - "Explain the slot allocation policy" β RAG (contextual explanation) */ @Get('status') @RequirePermission('resource:query') async queryResourceStatus(@Query('query') query: string) { return this.resourceStatusService.queryResourceStatus(query); }
/** * Get Available Slots (Direct RBS β maximum performance) */ @Get('slots/available') @RequirePermission('resource:query') async getAvailableSlots(@Query('zone') zoneName: string = 'general') { return this.ruleEngine.getAvailableSlots(zoneName); }
/** * Get Equipment Status (Direct RBS) */ @Get('equipment/status') @RequirePermission('resource:query') async getEquipmentStatus( @Query('type') equipmentType: string, @Query('zone') zoneName?: string, ) { return this.ruleEngine.getEquipmentStatus(equipmentType, zoneName); }}Example API response (HIGH_CONTROL_RBS):
{ "status": { "code": 200000, "message": "Request Succeeded" }, "data": { "type": "resource-status", "attributes": { "control_level": "HIGH_CONTROL_RBS", "query": "How many slots are available in Zone A?", "response": { "count": 3, "slot_ids": ["slot-101", "slot-102", "slot-103"], "total_capacity": 10, "occupancy_rate": 70 }, "source": "RBS", "processing_time_ms": 25 } }}Complete Request Flow: Sequence Diagram
Section titled βComplete Request Flow: Sequence DiagramβThis diagram shows the critical difference between HIGH_CONTROL_RBS queries (which bypass all AI) and LOW_CONTROL_RAG queries:
sequenceDiagram
autonumber
actor User as Operations Staff
participant API as API Gateway
participant Auth as AuthGuard
participant Classifier as Query Classifier
participant RBS as Rule Engine (RBS)
participant DB as PostgreSQL (Core)
participant RAG as RAG Explainer
participant Embed as Embedding Service
participant Ollama as Ollama (LLM)
participant Transform as Transform Interceptor
Note over User,Transform: SCENARIO 1 β HIGH_CONTROL_RBS Query
User->>+API: GET /resources/status?query=How many slots in Zone A?
API->>+Auth: Validate JWT & Permissions
Auth-->>-API: Auth OK
API->>+Classifier: classify("How many slots in Zone A?")
Classifier->>Classifier: Detect keywords: "how many", "slots" β HIGH_CONTROL
Classifier-->>-API: HIGH_CONTROL_RBS
Note over Classifier,DB: CRITICAL: AI/LLM is BYPASSED β query goes directly to database
API->>+RBS: executeRbsQuery(...)
RBS->>+DB: SELECT COUNT(*) FROM slots WHERE zone='A' AND status='available'
DB-->>-RBS: 3 available slots
RBS-->>-API: { count: 3, slotIds: [...], totalCapacity: 10 }
API->>Transform: Format response
Transform-->>API: JSON:API formatted response
API-->>User: HTTP 200 { "response": { "count": 3, ... }, "source": "RBS" }
Note over User: Staff receives EXACT data from database β NOT AI-generated
Note over User,Transform: SCENARIO 2 β LOW_CONTROL_RAG Query
User->>+API: GET /resources/status?query=Explain slot allocation policy
API->>+Auth: Validate JWT & Permissions
Auth-->>-API: Auth OK
API->>+Classifier: classify("Explain slot allocation policy")
Classifier->>Classifier: Detect keywords: "explain", "policy" β LOW_CONTROL
Classifier-->>-API: LOW_CONTROL_RAG
Note over Classifier,Ollama: This query is SUITABLE for AI β no critical data at risk
API->>+RAG: explain("Explain slot allocation policy")
RAG->>+Embed: getEmbedding("slot allocation policy")
Embed-->>-RAG: [0.234, -0.567, ...]
RAG->>DB: SELECT content FROM guidelines WHERE embedding <=> $1 LIMIT 3
DB-->>RAG: Similar documents found
RAG->>+Ollama: POST /v1/chat/completions (prompt with context)
Ollama-->>-RAG: "The slot allocation policy prioritizes..."
RAG-->>-API: Explanation text
API->>Transform: Format response
Transform-->>API: JSON:API formatted response
API-->>User: HTTP 200 { "response": "The policy prioritizes...", "source": "RAG" }
Note over User: Staff receives AI-generated explanation β suitable for guidance
Note over RBS,DB: RBS Path: ~25β50ms | Deterministic, no hallucination
Note over RAG,Ollama: RAG Path: ~500β2000ms | AI-enhanced, for non-critical queries
Error Handling and Fallback Strategies
Section titled βError Handling and Fallback StrategiesβKey Principles
Section titled βKey Principlesβ| Failure | Behavior |
|---|---|
| RBS database timeout | Throw ServiceUnavailableException β never fall back to AI for HIGH_CONTROL queries |
| Embedding service down | Fall back to keyword search (if implemented), or return 503 |
| Ollama server down | Return 503 with message βExplanation service temporarily unavailableβ |
| Unrecognized query | Default to HIGH_CONTROL_RBS (fail-safe β better cautious than hallucinating) |
// In ResourceStatusService: never silently fall back from RBS to RAGif (controlLevel === 'HIGH_CONTROL_RBS') { // If this throws, let it propagate β do NOT catch and reroute to RAG response = await this.executeRbsQuery(userQuery); source = 'RBS';}Related Documentation
Section titled βRelated Documentationβ- AI Integration (Ollama + RAG) β Full RAG pipeline with pgvector and Ollama
- Rule-Based Decision Engine β IF-THEN rule engine with safety validation and inventory checks