cleanup
This commit is contained in:
252
system_design/specifications/Failure_Handling_Model.md
Normal file
252
system_design/specifications/Failure_Handling_Model.md
Normal file
@@ -0,0 +1,252 @@
|
||||
# Failure Handling Model Specification
|
||||
|
||||
**Document Type:** Normative System Specification
|
||||
**Scope:** Sensor Hub (Sub-Hub) Fault Detection, Classification, and Recovery
|
||||
**Traceability:** SR-DIAG-001 through SR-DIAG-011, SR-SYS-002, SR-SYS-004
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
This document defines the fault taxonomy, escalation rules, recovery behaviors, and integration with the system state machine. All components SHALL adhere to this failure handling model.
|
||||
|
||||
## 2. Fault Taxonomy
|
||||
|
||||
### 2.1 Severity Levels
|
||||
|
||||
| Severity | Code | Description | State Impact | Recovery Behavior |
|
||||
|----------|------|-------------|--------------|-------------------|
|
||||
| **INFO** | `DIAG_SEV_INFO` | Informational event, no action required | None | Log only |
|
||||
| **WARNING** | `DIAG_SEV_WARNING` | Non-critical fault, degraded operation | `RUNNING` → `WARNING` | Continue with reduced functionality |
|
||||
| **ERROR** | `DIAG_SEV_ERROR` | Critical fault, feature disabled | Feature-specific | Feature isolation, retry logic |
|
||||
| **FATAL** | `DIAG_SEV_FATAL` | System-critical fault, core functionality disabled | `RUNNING` → `FAULT` | Controlled teardown, recovery attempt |
|
||||
|
||||
### 2.2 Fault Categories
|
||||
|
||||
| Category | Description | Examples | Typical Severity |
|
||||
|----------|-------------|----------|------------------|
|
||||
| **SENSOR** | Sensor hardware or communication failure | Disconnection, out-of-range, non-responsive | WARNING (single), ERROR (multiple), FATAL (all) |
|
||||
| **COMMUNICATION** | Network or protocol failure | Link loss, timeout, authentication failure | WARNING (temporary), ERROR (persistent), FATAL (critical) |
|
||||
| **STORAGE** | Persistence or storage medium failure | SD card failure, NVM corruption, write failure | WARNING (degraded), ERROR (persistent), FATAL (critical) |
|
||||
| **SECURITY** | Security violation or authentication failure | Secure boot failure, key corruption, unauthorized access | FATAL (always) |
|
||||
| **SYSTEM** | System resource or configuration failure | Memory exhaustion, task failure, configuration error | ERROR (recoverable), FATAL (unrecoverable) |
|
||||
| **OTA** | Firmware update failure | Validation failure, transfer error, flash error | ERROR (retry), FATAL (rollback) |
|
||||
| **CALIBRATION** | Calibration or machine constants failure | Invalid MC, calibration error, sensor mismatch | WARNING (single), ERROR (critical) |
|
||||
|
||||
## 3. Diagnostic Code Structure
|
||||
|
||||
### 3.1 Diagnostic Code Format
|
||||
|
||||
```
|
||||
DIAG-<CATEGORY>-<COMPONENT>-<NUMBER>
|
||||
```
|
||||
|
||||
- **CATEGORY:** Two-letter code (SN, CM, ST, SC, SY, OT, CL)
|
||||
- **COMPONENT:** Component identifier (e.g., TEMP, HUM, CO2, NET, SD, OTA)
|
||||
- **NUMBER:** Unique fault number (0001-9999)
|
||||
|
||||
### 3.2 Diagnostic Code Registry
|
||||
|
||||
| Code | Severity | Category | Component | Description |
|
||||
|------|----------|----------|-----------|-------------|
|
||||
| `DIAG-SN-TEMP-0001` | WARNING | SENSOR | Temperature | Temperature sensor disconnected |
|
||||
| `DIAG-SN-TEMP-0002` | ERROR | SENSOR | Temperature | Temperature sensor out of range |
|
||||
| `DIAG-SN-TEMP-0003` | FATAL | SENSOR | Temperature | All temperature sensors failed |
|
||||
| `DIAG-CM-NET-0001` | WARNING | COMMUNICATION | Network | Main Hub link temporarily lost |
|
||||
| `DIAG-CM-NET-0002` | ERROR | COMMUNICATION | Network | Main Hub link persistently lost |
|
||||
| `DIAG-ST-SD-0001` | WARNING | STORAGE | SD Card | SD card write failure (retry successful) |
|
||||
| `DIAG-ST-SD-0002` | ERROR | STORAGE | SD Card | SD card persistent write failure |
|
||||
| `DIAG-ST-SD-0003` | FATAL | STORAGE | SD Card | SD card corruption detected |
|
||||
| `DIAG-SC-BOOT-0001` | FATAL | SECURITY | Secure Boot | Secure boot verification failed |
|
||||
| `DIAG-SY-MEM-0001` | ERROR | SYSTEM | Memory | Memory allocation failure |
|
||||
| `DIAG-OT-FW-0001` | ERROR | OTA | Firmware | Firmware integrity validation failed |
|
||||
| `DIAG-CL-MC-0001` | WARNING | CALIBRATION | Machine Constants | Invalid sensor slot configuration |
|
||||
|
||||
## 4. Fault Detection Rules
|
||||
|
||||
### 4.1 Sensor Fault Detection
|
||||
|
||||
| Condition | Detection Method | Severity Assignment |
|
||||
|-----------|------------------|-------------------|
|
||||
| Sensor disconnected | Hardware presence signal | WARNING (if other sensors available) |
|
||||
| Sensor non-responsive | Communication timeout (3 retries) | ERROR (if critical sensor) |
|
||||
| Sensor out of range | Value validation against limits | WARNING (if single occurrence), ERROR (if persistent) |
|
||||
| All sensors failed | Count of failed sensors = total | FATAL |
|
||||
|
||||
### 4.2 Communication Fault Detection
|
||||
|
||||
| Condition | Detection Method | Severity Assignment |
|
||||
|-----------|------------------|-------------------|
|
||||
| Link temporarily lost | Heartbeat timeout (< 30s) | WARNING |
|
||||
| Link persistently lost | Heartbeat timeout (> 5 minutes) | ERROR |
|
||||
| Authentication failure | Security layer rejection | FATAL |
|
||||
| Protocol error | Message parsing failure (3 consecutive) | ERROR |
|
||||
|
||||
### 4.3 Storage Fault Detection
|
||||
|
||||
| Condition | Detection Method | Severity Assignment |
|
||||
|-----------|------------------|-------------------|
|
||||
| Write failure (retry successful) | Write operation with retry | WARNING |
|
||||
| Write failure (persistent) | Write operation failure (3 retries) | ERROR |
|
||||
| SD card corruption | File system check failure | FATAL |
|
||||
| Storage full | Available space < threshold | WARNING |
|
||||
|
||||
### 4.4 Security Fault Detection
|
||||
|
||||
| Condition | Detection Method | Severity Assignment |
|
||||
|-----------|------------------|-------------------|
|
||||
| Secure boot failure | Boot verification failure | FATAL (always) |
|
||||
| Key corruption | Cryptographic key validation failure | FATAL |
|
||||
| Unauthorized access | Authentication failure (3 attempts) | FATAL |
|
||||
| Message tampering | Integrity check failure | ERROR (if persistent → FATAL) |
|
||||
|
||||
## 5. Escalation Rules
|
||||
|
||||
### 5.1 Severity Escalation
|
||||
|
||||
| Current Severity | Escalation Trigger | New Severity | State Transition |
|
||||
|------------------|-------------------|--------------|-----------------|
|
||||
| INFO | N/A | N/A | None |
|
||||
| WARNING | Same fault persists > 5 minutes | ERROR | `WARNING` → `WARNING` (feature degraded) |
|
||||
| WARNING | Multiple warnings (≥3) | ERROR | `WARNING` → `WARNING` (feature degraded) |
|
||||
| WARNING | Critical feature affected | FATAL | `WARNING` → `FAULT` |
|
||||
| ERROR | Same fault persists > 10 minutes | FATAL | `RUNNING` → `FAULT` |
|
||||
| ERROR | Cascading failures (≥2 features) | FATAL | `RUNNING` → `FAULT` |
|
||||
| FATAL | N/A | N/A | `RUNNING` → `FAULT` |
|
||||
|
||||
### 5.2 Cascading Failure Detection
|
||||
|
||||
A cascading failure is detected when:
|
||||
- Multiple independent features fail simultaneously
|
||||
- Failure in one feature causes failure in another
|
||||
- System resource exhaustion (memory, CPU, storage)
|
||||
|
||||
**Response:** Immediate escalation to FATAL, transition to `FAULT` state.
|
||||
|
||||
## 6. Recovery Behaviors
|
||||
|
||||
### 6.1 Recovery Strategies by Severity
|
||||
|
||||
| Severity | Recovery Strategy | Retry Logic | State Impact |
|
||||
|----------|------------------|-------------|--------------|
|
||||
| **INFO** | None | N/A | None |
|
||||
| **WARNING** | Automatic retry, degraded operation | 3 retries with exponential backoff | Continue in `WARNING` state |
|
||||
| **ERROR** | Feature isolation, automatic retry | 3 retries, then manual intervention | Feature disabled, system continues |
|
||||
| **FATAL** | Controlled teardown, recovery attempt | Single recovery attempt, then manual | `FAULT` → `TEARDOWN` → `INIT` |
|
||||
|
||||
### 6.2 Recovery Time Limits
|
||||
|
||||
| Fault Type | Maximum Recovery Time | Recovery Action |
|
||||
|------------|----------------------|----------------|
|
||||
| Sensor (WARNING) | 5 minutes | Automatic retry, sensor exclusion |
|
||||
| Communication (WARNING) | 30 seconds | Automatic reconnection |
|
||||
| Storage (WARNING) | 10 seconds | Retry write operation |
|
||||
| Sensor (ERROR) | Manual intervention | Sensor marked as failed |
|
||||
| Communication (ERROR) | Manual intervention | Communication feature disabled |
|
||||
| Storage (ERROR) | Manual intervention | Persistence disabled, system continues |
|
||||
| FATAL (any) | 60 seconds | Controlled teardown and recovery attempt |
|
||||
|
||||
### 6.3 Latching Behavior
|
||||
|
||||
| Severity | Latching Rule | Clear Condition |
|
||||
|----------|--------------|----------------|
|
||||
| **INFO** | Not latched | Overwritten by new event |
|
||||
| **WARNING** | Latched until cleared | Fault condition cleared + manual clear OR automatic clear after 1 hour |
|
||||
| **ERROR** | Latched until cleared | Manual clear via diagnostic session OR system reset |
|
||||
| **FATAL** | Latched until cleared | Manual clear via diagnostic session OR system reset |
|
||||
|
||||
## 7. Fault Reporting
|
||||
|
||||
### 7.1 Reporting Channels
|
||||
|
||||
| Severity | Local HMI | Diagnostic Log | Main Hub | Diagnostic Session |
|
||||
|----------|-----------|----------------|----------|-------------------|
|
||||
| **INFO** | Optional | Yes | No | Yes |
|
||||
| **WARNING** | Yes (status indicator) | Yes | Yes (periodic) | Yes |
|
||||
| **ERROR** | Yes (status indicator) | Yes | Yes (immediate) | Yes |
|
||||
| **FATAL** | Yes (status indicator) | Yes | Yes (immediate) | Yes |
|
||||
|
||||
### 7.2 Diagnostic Event Structure
|
||||
|
||||
```c
|
||||
typedef struct {
|
||||
uint32_t diagnostic_code; // Unique diagnostic code
|
||||
diag_severity_t severity; // INFO, WARNING, ERROR, FATAL
|
||||
uint64_t timestamp; // System timestamp (microseconds)
|
||||
const char* source_component; // Component identifier
|
||||
uint32_t occurrence_count; // Number of occurrences
|
||||
bool is_latched; // Latching status
|
||||
fault_category_t category; // SENSOR, COMMUNICATION, etc.
|
||||
} diagnostic_event_t;
|
||||
```
|
||||
|
||||
## 8. Integration with State Machine
|
||||
|
||||
### 8.1 Fault-to-State Mapping
|
||||
|
||||
| Fault Severity | Current State | Target State | Transition Trigger |
|
||||
|----------------|---------------|--------------|-------------------|
|
||||
| INFO | Any | Same | None (no state change) |
|
||||
| WARNING | `RUNNING` | `WARNING` | First WARNING fault |
|
||||
| WARNING | `WARNING` | `WARNING` | Additional WARNING (latched) |
|
||||
| ERROR | `RUNNING` | `RUNNING` | Feature isolation, continue |
|
||||
| ERROR | `WARNING` | `WARNING` | Feature isolation, continue |
|
||||
| FATAL | `RUNNING` | `FAULT` | First FATAL fault |
|
||||
| FATAL | `WARNING` | `FAULT` | Escalation to FATAL |
|
||||
| FATAL | `FAULT` | `FAULT` | Additional FATAL (latched) |
|
||||
|
||||
### 8.2 State-Dependent Fault Handling
|
||||
|
||||
| State | Fault Handling Behavior |
|
||||
|-------|------------------------|
|
||||
| `INIT` | Boot-time faults → `BOOT_FAILURE` if security-related |
|
||||
| `RUNNING` | Full fault detection and handling |
|
||||
| `WARNING` | Fault escalation monitoring, recovery attempts |
|
||||
| `FAULT` | Fault logging only, recovery attempt preparation |
|
||||
| `OTA_PREP` | OTA-related faults only, others deferred |
|
||||
| `OTA_UPDATE` | OTA progress faults only |
|
||||
| `TEARDOWN` | Fault logging only, no new fault detection |
|
||||
| `SERVICE` | Fault inspection only, no new fault detection |
|
||||
|
||||
## 9. Error Handler Responsibilities
|
||||
|
||||
The Error Handler component SHALL:
|
||||
1. Receive fault reports from all components
|
||||
2. Classify faults according to taxonomy
|
||||
3. Determine severity and escalation
|
||||
4. Trigger state transitions when required
|
||||
5. Manage fault latching and clearing
|
||||
6. Coordinate recovery attempts
|
||||
7. Report faults to diagnostics and Main Hub
|
||||
|
||||
## 10. Traceability
|
||||
|
||||
- **SR-DIAG-001:** Implemented via diagnostic code framework
|
||||
- **SR-DIAG-002:** Implemented via unique diagnostic code assignment
|
||||
- **SR-DIAG-003:** Implemented via severity classification
|
||||
- **SR-DIAG-004:** Implemented via timestamp and source association
|
||||
- **SR-SYS-002:** Implemented via fault-to-state mapping
|
||||
- **SR-SYS-004:** Implemented via FATAL fault → TEARDOWN transition
|
||||
|
||||
## 11. Mermaid Fault Escalation Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
FaultDetected[Fault Detected] --> ClassifySeverity{Classify Severity}
|
||||
ClassifySeverity -->|INFO| LogOnly[Log Only]
|
||||
ClassifySeverity -->|WARNING| CheckState1{Current State?}
|
||||
ClassifySeverity -->|ERROR| IsolateFeature[Isolate Feature]
|
||||
ClassifySeverity -->|FATAL| TriggerFaultState[Trigger FAULT State]
|
||||
|
||||
CheckState1 -->|RUNNING| TransitionWarning[Transition to WARNING]
|
||||
CheckState1 -->|WARNING| LatchWarning[Latch Warning]
|
||||
|
||||
IsolateFeature --> RetryLogic{Retry Logic}
|
||||
RetryLogic -->|Success| ClearError[Clear Error]
|
||||
RetryLogic -->|Failure| EscalateToFatal{Escalate?}
|
||||
EscalateToFatal -->|Yes| TriggerFaultState
|
||||
EscalateToFatal -->|No| ManualIntervention[Manual Intervention]
|
||||
|
||||
TriggerFaultState --> TeardownSequence[Initiate Teardown]
|
||||
TeardownSequence --> RecoveryAttempt{Recovery Attempt}
|
||||
RecoveryAttempt -->|Success| ResetToInit[Reset to INIT]
|
||||
RecoveryAttempt -->|Failure| ManualIntervention
|
||||
```
|
||||
304
system_design/specifications/System Review Checklist.md
Normal file
304
system_design/specifications/System Review Checklist.md
Normal file
@@ -0,0 +1,304 @@
|
||||
# System Review Checklist
|
||||
|
||||
**Project:** Sensor Hub – Poultry Farm Automation
|
||||
**Scope:** Sensor Hub (Sub-Hub Only)
|
||||
**Purpose:** Verify system readiness before FRD/SAD generation and AI-assisted implementation
|
||||
|
||||
## 1\. Requirements Completeness Review
|
||||
|
||||
### 1.1 Feature Coverage
|
||||
|
||||
✔ All major functional domains defined:
|
||||
|
||||
* ☐ Data Acquisition (DAQ)
|
||||
|
||||
* ☐ Data Quality & Calibration (DQC)
|
||||
|
||||
* ☐ Communication (COM)
|
||||
|
||||
* ☐ Diagnostics & Health (DIAG)
|
||||
|
||||
* ☐ Persistence & Data Management (DATA)
|
||||
|
||||
* ☐ OTA Update (OTA)
|
||||
|
||||
* ☐ Security & Safety (SEC)
|
||||
|
||||
* ☐ System Management & HMI (SYS)
|
||||
|
||||
|
||||
**Acceptance Criteria:**
|
||||
No functional behavior is undocumented or implicit.
|
||||
|
||||
### 1.2 Requirement Quality
|
||||
|
||||
For **each system requirement**, verify:
|
||||
|
||||
* ☐ Uses “SHALL”
|
||||
|
||||
* ☐ Is testable
|
||||
|
||||
* ☐ Is unambiguous
|
||||
|
||||
* ☐ Has a unique ID
|
||||
|
||||
* ☐ Is traceable to a feature
|
||||
|
||||
|
||||
**Red Flags:**
|
||||
|
||||
* Vague timing (“fast”, “real-time”)
|
||||
|
||||
* Mixed requirements (“shall… and …”)
|
||||
|
||||
* Implementation leakage (“using mutex”)
|
||||
|
||||
|
||||
## 2\. Architectural Soundness Review
|
||||
|
||||
### 2.1 Layering & Separation of Concerns
|
||||
|
||||
* ☐ Hardware access isolated
|
||||
|
||||
* ☐ No feature bypasses System Manager
|
||||
|
||||
* ☐ Persistence accessed only via DP
|
||||
|
||||
* ☐ HMI does not modify safety-critical configuration
|
||||
|
||||
|
||||
**Fail Condition:**
|
||||
Any feature directly accesses hardware or storage without abstraction.
|
||||
|
||||
### 2.2 State Machine Validity
|
||||
|
||||
* ☐ All system states defined
|
||||
|
||||
* ☐ Valid transitions documented
|
||||
|
||||
* ☐ Illegal transitions blocked
|
||||
|
||||
* ☐ Feature behavior defined per state
|
||||
|
||||
|
||||
**States to Verify:**
|
||||
|
||||
* INIT
|
||||
|
||||
* IDLE
|
||||
|
||||
* RUNNING
|
||||
|
||||
* DEGRADED
|
||||
|
||||
* OTA\_UPDATE
|
||||
|
||||
* TEARDOWN
|
||||
|
||||
* ERROR
|
||||
|
||||
|
||||
## 3\. Cross-Feature Constraint Compliance
|
||||
|
||||
### 3.1 Constraint Awareness
|
||||
|
||||
* ☐ Each feature respects cross-feature constraints
|
||||
|
||||
* ☐ No constraint contradicts a requirement
|
||||
|
||||
* ☐ Constraints are globally enforceable
|
||||
|
||||
|
||||
### 3.2 Conflict Resolution
|
||||
|
||||
Check for conflicts such as:
|
||||
|
||||
* ☐ OTA vs DAQ timing
|
||||
|
||||
* ☐ Persistence vs Power Loss
|
||||
|
||||
* ☐ Diagnostics vs Real-Time Tasks
|
||||
|
||||
* ☐ Debug vs Secure Boot
|
||||
|
||||
|
||||
**Acceptance:**
|
||||
Conflicts resolved via priority rules or system state restrictions.
|
||||
|
||||
## 4\. Timing & Performance Review
|
||||
|
||||
### 4.1 Real-Time Constraints
|
||||
|
||||
* ☐ High-frequency sampling bounded
|
||||
|
||||
* ☐ Worst-case execution time considered
|
||||
|
||||
* ☐ Non-blocking design enforced
|
||||
|
||||
|
||||
### 4.2 Resource Usage
|
||||
|
||||
* ☐ CPU usage bounded
|
||||
|
||||
* ☐ RAM usage predictable
|
||||
|
||||
* ☐ Stack sizes justified
|
||||
|
||||
* ☐ Heap usage minimized in runtime
|
||||
|
||||
|
||||
## 5\. Reliability & Fault Handling Review
|
||||
|
||||
### 5.1 Fault Detection
|
||||
|
||||
* ☐ Sensor failure detection defined
|
||||
|
||||
* ☐ Communication failure detection defined
|
||||
|
||||
* ☐ Storage failure detection defined
|
||||
|
||||
|
||||
### 5.2 Fault Response
|
||||
|
||||
* ☐ Graceful degradation defined
|
||||
|
||||
* ☐ Diagnostics logged
|
||||
|
||||
* ☐ System state updated appropriately
|
||||
|
||||
|
||||
## 6\. Security Review
|
||||
|
||||
### 6.1 Boot & Firmware
|
||||
|
||||
* ☐ Secure boot enforced
|
||||
|
||||
* ☐ Firmware integrity verified
|
||||
|
||||
* ☐ Rollback prevention defined
|
||||
|
||||
|
||||
### 6.2 Communication
|
||||
|
||||
* ☐ Encryption mandatory
|
||||
|
||||
* ☐ Authentication required
|
||||
|
||||
* ☐ Key management strategy defined
|
||||
|
||||
|
||||
### 6.3 Debug Access
|
||||
|
||||
* ☐ Debug sessions authenticated
|
||||
|
||||
* ☐ Debug disabled in production unless authorized
|
||||
|
||||
* ☐ Debug cannot bypass security or safety
|
||||
|
||||
|
||||
## 7\. Data Management Review
|
||||
|
||||
### 7.1 Data Ownership
|
||||
|
||||
* ☐ Single source of truth enforced
|
||||
|
||||
* ☐ Clear ownership per data type
|
||||
|
||||
* ☐ No duplicated persistent data
|
||||
|
||||
|
||||
### 7.2 Persistence Safety
|
||||
|
||||
* ☐ Safe writes during state transitions
|
||||
|
||||
* ☐ Power-loss tolerance defined
|
||||
|
||||
* ☐ Data recovery defined
|
||||
|
||||
|
||||
## 8\. HMI & Usability Review (OLED + Buttons)
|
||||
|
||||
### 8.1 Display Content
|
||||
|
||||
* ☐ Connectivity status
|
||||
|
||||
* ☐ System status
|
||||
|
||||
* ☐ Connected sensors
|
||||
|
||||
* ☐ Time & date
|
||||
|
||||
|
||||
### 8.2 Navigation Logic
|
||||
|
||||
* ☐ Menu hierarchy defined
|
||||
|
||||
* ☐ Button behavior consistent
|
||||
|
||||
* ☐ No destructive action via UI
|
||||
|
||||
|
||||
## 9\. Standards & Compliance Readiness
|
||||
|
||||
### 9.1 IEC 61499 Mapping Readiness
|
||||
|
||||
* ☐ Functional blocks identifiable
|
||||
|
||||
* ☐ Event/data separation respected
|
||||
|
||||
* ☐ Distributed execution possible
|
||||
|
||||
|
||||
### 9.2 ISA-95 Alignment Readiness
|
||||
|
||||
* ☐ Sensor Hub maps to Level 1/2
|
||||
|
||||
* ☐ Clear boundary to Level 3/4
|
||||
|
||||
* ☐ No business logic leakage
|
||||
|
||||
|
||||
## 10\. AI Readiness Review
|
||||
|
||||
### 10.1 Prompt Compatibility
|
||||
|
||||
* ☐ Features modular
|
||||
|
||||
* ☐ Requirements structured
|
||||
|
||||
* ☐ Architecture explicit
|
||||
|
||||
|
||||
### 10.2 Tool Handoff Readiness
|
||||
|
||||
* ☐ Claude can generate FRD/SAD
|
||||
|
||||
* ☐ Mermaid diagrams derivable
|
||||
|
||||
* ☐ DeepSeek can critique logic
|
||||
|
||||
* ☐ Cursor rules enforce constraints
|
||||
|
||||
|
||||
## Final Gate Decision
|
||||
|
||||
### GO / NO-GO Criteria
|
||||
|
||||
**GO** if:
|
||||
|
||||
* All sections ≥ 90% checked
|
||||
|
||||
* No critical architectural violation
|
||||
|
||||
* Security constraints enforced
|
||||
|
||||
|
||||
**NO-GO** if:
|
||||
|
||||
* Missing system states
|
||||
|
||||
* Undefined failure behavior
|
||||
|
||||
* Security gaps
|
||||
|
||||
* Persistence inconsistency
|
||||
@@ -0,0 +1,314 @@
|
||||
# System State Machine Specification
|
||||
|
||||
**Document Type:** Normative System Specification
|
||||
**Scope:** Sensor Hub (Sub-Hub) Operational States
|
||||
**Traceability:** SR-SYS-001, SR-SYS-002, SR-SYS-003
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
This document defines the complete finite state machine (FSM) governing the Sensor Hub's operational lifecycle. All system components SHALL respect state-based operation restrictions as defined herein.
|
||||
|
||||
## 2. State Definitions
|
||||
|
||||
### 2.1 State Enumeration
|
||||
|
||||
| State ID | State Name | Description | Entry Condition |
|
||||
|----------|------------|-------------|-----------------|
|
||||
| `INIT` | Initialization | Hardware and software initialization phase | Power-on, reset, or post-teardown |
|
||||
| `BOOT_FAILURE` | Boot Failure | Secure boot verification failed | Secure boot check failure during INIT |
|
||||
| `RUNNING` | Normal Operation | Active sensor acquisition and communication | Successful initialization |
|
||||
| `WARNING` | Degraded Operation | Non-fatal fault detected, degraded functionality | Non-critical fault detected during RUNNING |
|
||||
| `FAULT` | Fatal Error | Critical fault, core functionality disabled | Fatal error or cascading failures |
|
||||
| `OTA_PREP` | OTA Preparation | Preparing for firmware update | OTA request accepted, validation pending |
|
||||
| `OTA_UPDATE` | OTA Update Active | Firmware update in progress | Firmware transfer and flashing |
|
||||
| `MC_UPDATE` | Machine Constants Update | Machine constants update in progress | MC update request accepted |
|
||||
| `TEARDOWN` | Controlled Shutdown | Safe shutdown sequence execution | Update, fault recovery, or manual command |
|
||||
| `SERVICE` | Service Mode | Engineering/diagnostic interaction | Debug session active |
|
||||
| `SD_DEGRADED` | SD Card Degraded | SD card failure detected, fallback mode | SD card access failure |
|
||||
|
||||
### 2.2 State Characteristics
|
||||
|
||||
#### INIT
|
||||
- **Duration:** Bounded (max 5 seconds)
|
||||
- **Allowed Operations:** Hardware initialization, secure boot verification, MC loading
|
||||
- **Forbidden Operations:** Sensor acquisition, communication, persistence writes
|
||||
- **Exit Conditions:** Success → RUNNING, Secure boot failure → BOOT_FAILURE
|
||||
|
||||
#### BOOT_FAILURE
|
||||
- **Duration:** Indefinite (requires manual intervention)
|
||||
- **Allowed Operations:** Diagnostic reporting, secure boot retry (limited)
|
||||
- **Forbidden Operations:** All application features
|
||||
- **Exit Conditions:** Manual reset, secure boot success → INIT
|
||||
|
||||
#### RUNNING
|
||||
- **Duration:** Indefinite (normal operation)
|
||||
- **Allowed Operations:** All features (DAQ, DQC, COM, DIAG, DATA, HMI)
|
||||
- **Forbidden Operations:** OTA, MC update (must transition via TEARDOWN)
|
||||
- **Exit Conditions:** Fault → WARNING/FAULT, OTA request → OTA_PREP, MC update → MC_UPDATE, Debug session → SERVICE
|
||||
|
||||
#### WARNING
|
||||
- **Duration:** Until fault cleared or escalated
|
||||
- **Allowed Operations:** Degraded DAQ, COM, DIAG (limited), DATA (read-only)
|
||||
- **Forbidden Operations:** OTA, MC update
|
||||
- **Exit Conditions:** Fault cleared → RUNNING, Fault escalated → FAULT
|
||||
|
||||
#### FAULT
|
||||
- **Duration:** Until recovery attempt or manual intervention
|
||||
- **Allowed Operations:** Diagnostic reporting, error logging, controlled teardown
|
||||
- **Forbidden Operations:** Sensor acquisition, communication (except diagnostics)
|
||||
- **Exit Conditions:** Recovery attempt → TEARDOWN, Manual reset → INIT
|
||||
|
||||
#### OTA_PREP
|
||||
- **Duration:** Bounded (max 2 seconds)
|
||||
- **Allowed Operations:** OTA readiness validation, teardown initiation
|
||||
- **Forbidden Operations:** Sensor acquisition, new communication sessions
|
||||
- **Exit Conditions:** Ready → TEARDOWN, Rejected → RUNNING
|
||||
|
||||
#### OTA_UPDATE
|
||||
- **Duration:** Bounded (max 10 minutes)
|
||||
- **Allowed Operations:** Firmware reception, validation, flashing
|
||||
- **Forbidden Operations:** Sensor acquisition, normal communication, persistence (except OTA data)
|
||||
- **Exit Conditions:** Success → RUNNING (after reboot), Failure → FAULT
|
||||
|
||||
#### MC_UPDATE
|
||||
- **Duration:** Bounded (max 30 seconds)
|
||||
- **Allowed Operations:** MC reception, validation, teardown
|
||||
- **Forbidden Operations:** Sensor acquisition, normal communication
|
||||
- **Exit Conditions:** Success → TEARDOWN, Failure → RUNNING
|
||||
|
||||
#### TEARDOWN
|
||||
- **Duration:** Bounded (max 500ms)
|
||||
- **Allowed Operations:** Data flush, resource release, state persistence
|
||||
- **Forbidden Operations:** New sensor acquisition, new communication sessions
|
||||
- **Exit Conditions:** Complete → INIT (reset), OTA → OTA_UPDATE, MC → MC_UPDATE
|
||||
|
||||
#### SERVICE
|
||||
- **Duration:** Until session closed
|
||||
- **Allowed Operations:** Diagnostic access, read-only inspection, controlled commands
|
||||
- **Forbidden Operations:** Sensor acquisition (may be paused), OTA, MC update
|
||||
- **Exit Conditions:** Session closed → RUNNING
|
||||
|
||||
#### SD_DEGRADED
|
||||
- **Duration:** Until SD recovery or manual intervention
|
||||
- **Allowed Operations:** Sensor acquisition (no persistence), communication, diagnostics
|
||||
- **Forbidden Operations:** Persistence writes (except critical diagnostics)
|
||||
- **Exit Conditions:** SD recovery → RUNNING, Manual intervention → SERVICE
|
||||
|
||||
## 3. State Transition Table
|
||||
|
||||
| From State | To State | Trigger | Guard Condition | Action | Authorized Caller |
|
||||
|------------|----------|---------|-----------------|--------|------------------|
|
||||
| `[*]` | `INIT` | Power-on, Reset | None | Initialize hardware, secure boot check | System |
|
||||
| `INIT` | `RUNNING` | Init success | Secure boot OK, MC loaded, sensors detected | Start DAQ, COM, DIAG tasks | System Manager |
|
||||
| `INIT` | `BOOT_FAILURE` | Secure boot fail | Secure boot verification failed | Log security fault, disable application | Secure Boot |
|
||||
| `BOOT_FAILURE` | `INIT` | Manual reset | None | Reset system | User/Engineer |
|
||||
| `RUNNING` | `WARNING` | Non-fatal fault | Diagnostic severity = WARNING | Degrade functionality, notify | Error Handler |
|
||||
| `RUNNING` | `FAULT` | Fatal fault | Diagnostic severity = FATAL | Stop critical features | Error Handler |
|
||||
| `RUNNING` | `OTA_PREP` | OTA request | OTA request received, system ready | Validate readiness | OTA Manager |
|
||||
| `RUNNING` | `MC_UPDATE` | MC update request | MC update received, authenticated | Validate MC | MC Manager |
|
||||
| `RUNNING` | `SERVICE` | Debug session | Debug session authenticated | Pause non-critical tasks | Debug Manager |
|
||||
| `RUNNING` | `SD_DEGRADED` | SD failure | SD card access failure detected | Disable persistence writes | Persistence |
|
||||
| `WARNING` | `RUNNING` | Fault cleared | Diagnostic cleared, system healthy | Restore full functionality | Error Handler |
|
||||
| `WARNING` | `FAULT` | Fault escalated | Multiple warnings or critical fault | Stop degraded features | Error Handler |
|
||||
| `FAULT` | `TEARDOWN` | Recovery attempt | Recovery command received | Initiate controlled shutdown | System Manager |
|
||||
| `OTA_PREP` | `TEARDOWN` | OTA ready | Readiness validated | Begin teardown | OTA Manager |
|
||||
| `OTA_PREP` | `RUNNING` | OTA rejected | Readiness check failed | Resume normal operation | OTA Manager |
|
||||
| `TEARDOWN` | `OTA_UPDATE` | Teardown complete (OTA) | OTA pending, data flushed | Enter OTA state | System Manager |
|
||||
| `TEARDOWN` | `MC_UPDATE` | Teardown complete (MC) | MC update pending, data flushed | Enter MC update | System Manager |
|
||||
| `TEARDOWN` | `INIT` | Teardown complete (reset) | Reset requested, data flushed | Reset system | System Manager |
|
||||
| `OTA_UPDATE` | `RUNNING` | OTA success | Firmware flashed, validated | Reboot into new firmware | OTA Manager |
|
||||
| `OTA_UPDATE` | `FAULT` | OTA failure | Firmware validation failed | Log error, enter fault | OTA Manager |
|
||||
| `MC_UPDATE` | `TEARDOWN` | MC update complete | MC validated, applied | Reinitialize system | MC Manager |
|
||||
| `SERVICE` | `RUNNING` | Session closed | Debug session terminated | Resume normal operation | Debug Manager |
|
||||
| `SD_DEGRADED` | `RUNNING` | SD recovered | SD card access restored | Re-enable persistence | Persistence |
|
||||
| `SD_DEGRADED` | `SERVICE` | Manual intervention | User intervention required | Enter service mode | User/Engineer |
|
||||
|
||||
## 4. Per-State Feature Execution Rules
|
||||
|
||||
### 4.1 DAQ (Data Acquisition) Feature
|
||||
|
||||
| State | Allowed Operations | Restrictions |
|
||||
|-------|-------------------|--------------|
|
||||
| `INIT` | None | Sensor initialization only |
|
||||
| `RUNNING` | Full acquisition cycle | None |
|
||||
| `WARNING` | Degraded acquisition (reduced frequency) | Failed sensors excluded |
|
||||
| `FAULT` | None | Acquisition stopped |
|
||||
| `OTA_PREP` | None | Acquisition stopped |
|
||||
| `OTA_UPDATE` | None | Acquisition stopped |
|
||||
| `MC_UPDATE` | None | Acquisition stopped |
|
||||
| `TEARDOWN` | None | Acquisition stopped |
|
||||
| `SERVICE` | Paused (optional read-only) | No new samples |
|
||||
| `SD_DEGRADED` | Full acquisition | Data not persisted |
|
||||
| `BOOT_FAILURE` | None | Not applicable |
|
||||
|
||||
### 4.2 DQC (Data Quality & Calibration) Feature
|
||||
|
||||
| State | Allowed Operations | Restrictions |
|
||||
|-------|-------------------|--------------|
|
||||
| `INIT` | Sensor detection, MC loading | No calibration |
|
||||
| `RUNNING` | Full quality checks, calibration | None |
|
||||
| `WARNING` | Degraded quality checks | Reduced validation |
|
||||
| `FAULT` | Error reporting only | No quality checks |
|
||||
| `OTA_PREP` | None | Quality checks stopped |
|
||||
| `OTA_UPDATE` | None | Quality checks stopped |
|
||||
| `MC_UPDATE` | MC validation only | No sensor calibration |
|
||||
| `TEARDOWN` | None | Quality checks stopped |
|
||||
| `SERVICE` | Read-only inspection | No calibration |
|
||||
| `SD_DEGRADED` | Full quality checks | Results not persisted |
|
||||
| `BOOT_FAILURE` | None | Not applicable |
|
||||
|
||||
### 4.3 COM (Communication) Feature
|
||||
|
||||
| State | Allowed Operations | Restrictions |
|
||||
|-------|-------------------|--------------|
|
||||
| `INIT` | None | No communication |
|
||||
| `RUNNING` | Full bidirectional communication | None |
|
||||
| `WARNING` | Limited communication (diagnostics only) | Reduced bandwidth |
|
||||
| `FAULT` | Diagnostic reporting only | No data transmission |
|
||||
| `OTA_PREP` | OTA negotiation only | No other communication |
|
||||
| `OTA_UPDATE` | OTA data transfer only | No other communication |
|
||||
| `MC_UPDATE` | MC transfer only | No other communication |
|
||||
| `TEARDOWN` | Session closure only | No new sessions |
|
||||
| `SERVICE` | Debug session communication | No Main Hub communication |
|
||||
| `SD_DEGRADED` | Full communication | Data not persisted |
|
||||
| `BOOT_FAILURE` | Diagnostic reporting only | Limited communication |
|
||||
|
||||
### 4.4 DIAG (Diagnostics) Feature
|
||||
|
||||
| State | Allowed Operations | Restrictions |
|
||||
|-------|-------------------|--------------|
|
||||
| `INIT` | Boot diagnostics | Limited logging |
|
||||
| `RUNNING` | Full diagnostics | None |
|
||||
| `WARNING` | Full diagnostics | None |
|
||||
| `FAULT` | Full diagnostics | None |
|
||||
| `OTA_PREP` | OTA diagnostics | Limited scope |
|
||||
| `OTA_UPDATE` | OTA progress diagnostics | Limited scope |
|
||||
| `MC_UPDATE` | MC update diagnostics | Limited scope |
|
||||
| `TEARDOWN` | Teardown diagnostics | Limited scope |
|
||||
| `SERVICE` | Full diagnostics (read access) | No new diagnostics |
|
||||
| `SD_DEGRADED` | Full diagnostics | Persistence limited |
|
||||
| `BOOT_FAILURE` | Security diagnostics | Limited scope |
|
||||
|
||||
### 4.5 DATA (Persistence) Feature
|
||||
|
||||
| State | Allowed Operations | Restrictions |
|
||||
|-------|-------------------|--------------|
|
||||
| `INIT` | MC loading only | No writes |
|
||||
| `RUNNING` | Full persistence | None |
|
||||
| `WARNING` | Read-only, critical writes | Limited writes |
|
||||
| `FAULT` | Critical diagnostics only | No sensor data writes |
|
||||
| `OTA_PREP` | Read-only | No writes |
|
||||
| `OTA_UPDATE` | OTA data only | No sensor data writes |
|
||||
| `MC_UPDATE` | MC writes only | No sensor data writes |
|
||||
| `TEARDOWN` | Critical data flush only | Authorized writes only |
|
||||
| `SERVICE` | Read-only | No writes |
|
||||
| `SD_DEGRADED` | Read-only (if possible) | No writes |
|
||||
| `BOOT_FAILURE` | None | Not applicable |
|
||||
|
||||
### 4.6 OTA Feature
|
||||
|
||||
| State | Allowed Operations | Restrictions |
|
||||
|-------|-------------------|--------------|
|
||||
| `INIT` | None | OTA not active |
|
||||
| `RUNNING` | OTA negotiation only | No transfer |
|
||||
| `WARNING` | None | OTA blocked |
|
||||
| `FAULT` | None | OTA blocked |
|
||||
| `OTA_PREP` | Readiness validation | No transfer |
|
||||
| `OTA_UPDATE` | Full OTA operations | None |
|
||||
| `MC_UPDATE` | None | OTA blocked |
|
||||
| `TEARDOWN` | None | OTA blocked |
|
||||
| `SERVICE` | None | OTA blocked |
|
||||
| `SD_DEGRADED` | None | OTA blocked |
|
||||
| `BOOT_FAILURE` | None | OTA blocked |
|
||||
|
||||
### 4.7 SEC (Security) Feature
|
||||
|
||||
| State | Allowed Operations | Restrictions |
|
||||
|-------|-------------------|--------------|
|
||||
| `INIT` | Secure boot verification | Must complete before app start |
|
||||
| `RUNNING` | Full security (encryption, authentication) | None |
|
||||
| `WARNING` | Full security | None |
|
||||
| `FAULT` | Security diagnostics | Limited operations |
|
||||
| `OTA_PREP` | OTA authentication | None |
|
||||
| `OTA_UPDATE` | Firmware verification | None |
|
||||
| `MC_UPDATE` | MC authentication | None |
|
||||
| `TEARDOWN` | Key protection | None |
|
||||
| `SERVICE` | Debug authentication | None |
|
||||
| `SD_DEGRADED` | Full security | None |
|
||||
| `BOOT_FAILURE` | Security fault handling | Limited operations |
|
||||
|
||||
### 4.8 SYS (System Management) Feature
|
||||
|
||||
| State | Allowed Operations | Restrictions |
|
||||
|-------|-------------------|--------------|
|
||||
| `INIT` | State management, initialization | Limited operations |
|
||||
| `RUNNING` | Full system management | None |
|
||||
| `WARNING` | Degraded management | Limited operations |
|
||||
| `FAULT` | Fault recovery management | Limited operations |
|
||||
| `OTA_PREP` | OTA state management | Limited operations |
|
||||
| `OTA_UPDATE` | OTA state management | Limited operations |
|
||||
| `MC_UPDATE` | MC state management | Limited operations |
|
||||
| `TEARDOWN` | Teardown execution | Limited operations |
|
||||
| `SERVICE` | Service mode management | Limited operations |
|
||||
| `SD_DEGRADED` | Degraded management | Limited operations |
|
||||
| `BOOT_FAILURE` | Boot failure management | Limited operations |
|
||||
|
||||
## 5. State Transition Timing Requirements
|
||||
|
||||
| Transition | Maximum Duration | Justification |
|
||||
|------------|------------------|---------------|
|
||||
| `[*]` → `INIT` | 100ms | Power-on initialization |
|
||||
| `INIT` → `RUNNING` | 5s | Hardware init, secure boot, MC load |
|
||||
| `INIT` → `BOOT_FAILURE` | 2s | Secure boot verification |
|
||||
| `RUNNING` → `WARNING` | 50ms | Fault detection and state change |
|
||||
| `RUNNING` → `FAULT` | 50ms | Critical fault detection |
|
||||
| `RUNNING` → `OTA_PREP` | 100ms | OTA request processing |
|
||||
| `OTA_PREP` → `TEARDOWN` | 2s | Readiness validation |
|
||||
| `TEARDOWN` → `OTA_UPDATE` | 500ms | Data flush and resource release |
|
||||
| `TEARDOWN` → `INIT` | 500ms | Data flush and reset |
|
||||
| `OTA_UPDATE` → `RUNNING` | 10 minutes | Firmware transfer and flashing |
|
||||
| `RUNNING` → `SERVICE` | 100ms | Debug session establishment |
|
||||
| `SERVICE` → `RUNNING` | 50ms | Debug session closure |
|
||||
|
||||
## 6. State Notification Mechanism
|
||||
|
||||
All state transitions SHALL notify registered components via the Event System:
|
||||
- **Event Type:** `SYSTEM_STATE_CHANGED`
|
||||
- **Payload:** Previous state, new state, transition reason
|
||||
- **Subscribers:** All feature managers (DAQ, DQC, COM, DIAG, DATA, OTA, SEC, SYS)
|
||||
|
||||
## 7. Traceability
|
||||
|
||||
- **SR-SYS-001:** Implemented via complete FSM definition
|
||||
- **SR-SYS-002:** Implemented via per-state feature execution rules
|
||||
- **SR-SYS-003:** Implemented via state notification mechanism
|
||||
|
||||
## 8. Mermaid State Diagram
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> INIT
|
||||
INIT --> RUNNING: initSuccess
|
||||
INIT --> BOOT_FAILURE: secureBootFail
|
||||
BOOT_FAILURE --> INIT: manualReset
|
||||
RUNNING --> WARNING: nonFatalFault
|
||||
RUNNING --> FAULT: fatalFault
|
||||
RUNNING --> OTA_PREP: otaRequest
|
||||
RUNNING --> MC_UPDATE: mcUpdateRequest
|
||||
RUNNING --> SERVICE: debugSession
|
||||
RUNNING --> SD_DEGRADED: sdFailure
|
||||
WARNING --> RUNNING: faultCleared
|
||||
WARNING --> FAULT: faultEscalated
|
||||
FAULT --> TEARDOWN: recoveryAttempt
|
||||
OTA_PREP --> TEARDOWN: otaReady
|
||||
OTA_PREP --> RUNNING: otaRejected
|
||||
TEARDOWN --> OTA_UPDATE: otaPending
|
||||
TEARDOWN --> MC_UPDATE: mcPending
|
||||
TEARDOWN --> INIT: resetRequested
|
||||
OTA_UPDATE --> RUNNING: otaSuccess
|
||||
OTA_UPDATE --> FAULT: otaFailure
|
||||
MC_UPDATE --> TEARDOWN: mcComplete
|
||||
SERVICE --> RUNNING: sessionClosed
|
||||
SD_DEGRADED --> RUNNING: sdRecovered
|
||||
SD_DEGRADED --> SERVICE: manualIntervention
|
||||
```
|
||||
Reference in New Issue
Block a user