This commit is contained in:
2026-01-26 12:43:14 +01:00
parent c631110349
commit bedcd373f5
133 changed files with 37347 additions and 29 deletions

View File

@@ -0,0 +1,252 @@
# Failure Handling Model Specification
**Document Type:** Normative System Specification
**Scope:** Sensor Hub (Sub-Hub) Fault Detection, Classification, and Recovery
**Traceability:** SR-DIAG-001 through SR-DIAG-011, SR-SYS-002, SR-SYS-004
## 1. Purpose
This document defines the fault taxonomy, escalation rules, recovery behaviors, and integration with the system state machine. All components SHALL adhere to this failure handling model.
## 2. Fault Taxonomy
### 2.1 Severity Levels
| Severity | Code | Description | State Impact | Recovery Behavior |
|----------|------|-------------|--------------|-------------------|
| **INFO** | `DIAG_SEV_INFO` | Informational event, no action required | None | Log only |
| **WARNING** | `DIAG_SEV_WARNING` | Non-critical fault, degraded operation | `RUNNING``WARNING` | Continue with reduced functionality |
| **ERROR** | `DIAG_SEV_ERROR` | Critical fault, feature disabled | Feature-specific | Feature isolation, retry logic |
| **FATAL** | `DIAG_SEV_FATAL` | System-critical fault, core functionality disabled | `RUNNING``FAULT` | Controlled teardown, recovery attempt |
### 2.2 Fault Categories
| Category | Description | Examples | Typical Severity |
|----------|-------------|----------|------------------|
| **SENSOR** | Sensor hardware or communication failure | Disconnection, out-of-range, non-responsive | WARNING (single), ERROR (multiple), FATAL (all) |
| **COMMUNICATION** | Network or protocol failure | Link loss, timeout, authentication failure | WARNING (temporary), ERROR (persistent), FATAL (critical) |
| **STORAGE** | Persistence or storage medium failure | SD card failure, NVM corruption, write failure | WARNING (degraded), ERROR (persistent), FATAL (critical) |
| **SECURITY** | Security violation or authentication failure | Secure boot failure, key corruption, unauthorized access | FATAL (always) |
| **SYSTEM** | System resource or configuration failure | Memory exhaustion, task failure, configuration error | ERROR (recoverable), FATAL (unrecoverable) |
| **OTA** | Firmware update failure | Validation failure, transfer error, flash error | ERROR (retry), FATAL (rollback) |
| **CALIBRATION** | Calibration or machine constants failure | Invalid MC, calibration error, sensor mismatch | WARNING (single), ERROR (critical) |
## 3. Diagnostic Code Structure
### 3.1 Diagnostic Code Format
```
DIAG-<CATEGORY>-<COMPONENT>-<NUMBER>
```
- **CATEGORY:** Two-letter code (SN, CM, ST, SC, SY, OT, CL)
- **COMPONENT:** Component identifier (e.g., TEMP, HUM, CO2, NET, SD, OTA)
- **NUMBER:** Unique fault number (0001-9999)
### 3.2 Diagnostic Code Registry
| Code | Severity | Category | Component | Description |
|------|----------|----------|-----------|-------------|
| `DIAG-SN-TEMP-0001` | WARNING | SENSOR | Temperature | Temperature sensor disconnected |
| `DIAG-SN-TEMP-0002` | ERROR | SENSOR | Temperature | Temperature sensor out of range |
| `DIAG-SN-TEMP-0003` | FATAL | SENSOR | Temperature | All temperature sensors failed |
| `DIAG-CM-NET-0001` | WARNING | COMMUNICATION | Network | Main Hub link temporarily lost |
| `DIAG-CM-NET-0002` | ERROR | COMMUNICATION | Network | Main Hub link persistently lost |
| `DIAG-ST-SD-0001` | WARNING | STORAGE | SD Card | SD card write failure (retry successful) |
| `DIAG-ST-SD-0002` | ERROR | STORAGE | SD Card | SD card persistent write failure |
| `DIAG-ST-SD-0003` | FATAL | STORAGE | SD Card | SD card corruption detected |
| `DIAG-SC-BOOT-0001` | FATAL | SECURITY | Secure Boot | Secure boot verification failed |
| `DIAG-SY-MEM-0001` | ERROR | SYSTEM | Memory | Memory allocation failure |
| `DIAG-OT-FW-0001` | ERROR | OTA | Firmware | Firmware integrity validation failed |
| `DIAG-CL-MC-0001` | WARNING | CALIBRATION | Machine Constants | Invalid sensor slot configuration |
## 4. Fault Detection Rules
### 4.1 Sensor Fault Detection
| Condition | Detection Method | Severity Assignment |
|-----------|------------------|-------------------|
| Sensor disconnected | Hardware presence signal | WARNING (if other sensors available) |
| Sensor non-responsive | Communication timeout (3 retries) | ERROR (if critical sensor) |
| Sensor out of range | Value validation against limits | WARNING (if single occurrence), ERROR (if persistent) |
| All sensors failed | Count of failed sensors = total | FATAL |
### 4.2 Communication Fault Detection
| Condition | Detection Method | Severity Assignment |
|-----------|------------------|-------------------|
| Link temporarily lost | Heartbeat timeout (< 30s) | WARNING |
| Link persistently lost | Heartbeat timeout (> 5 minutes) | ERROR |
| Authentication failure | Security layer rejection | FATAL |
| Protocol error | Message parsing failure (3 consecutive) | ERROR |
### 4.3 Storage Fault Detection
| Condition | Detection Method | Severity Assignment |
|-----------|------------------|-------------------|
| Write failure (retry successful) | Write operation with retry | WARNING |
| Write failure (persistent) | Write operation failure (3 retries) | ERROR |
| SD card corruption | File system check failure | FATAL |
| Storage full | Available space < threshold | WARNING |
### 4.4 Security Fault Detection
| Condition | Detection Method | Severity Assignment |
|-----------|------------------|-------------------|
| Secure boot failure | Boot verification failure | FATAL (always) |
| Key corruption | Cryptographic key validation failure | FATAL |
| Unauthorized access | Authentication failure (3 attempts) | FATAL |
| Message tampering | Integrity check failure | ERROR (if persistent FATAL) |
## 5. Escalation Rules
### 5.1 Severity Escalation
| Current Severity | Escalation Trigger | New Severity | State Transition |
|------------------|-------------------|--------------|-----------------|
| INFO | N/A | N/A | None |
| WARNING | Same fault persists > 5 minutes | ERROR | `WARNING``WARNING` (feature degraded) |
| WARNING | Multiple warnings (≥3) | ERROR | `WARNING``WARNING` (feature degraded) |
| WARNING | Critical feature affected | FATAL | `WARNING``FAULT` |
| ERROR | Same fault persists > 10 minutes | FATAL | `RUNNING``FAULT` |
| ERROR | Cascading failures (≥2 features) | FATAL | `RUNNING``FAULT` |
| FATAL | N/A | N/A | `RUNNING``FAULT` |
### 5.2 Cascading Failure Detection
A cascading failure is detected when:
- Multiple independent features fail simultaneously
- Failure in one feature causes failure in another
- System resource exhaustion (memory, CPU, storage)
**Response:** Immediate escalation to FATAL, transition to `FAULT` state.
## 6. Recovery Behaviors
### 6.1 Recovery Strategies by Severity
| Severity | Recovery Strategy | Retry Logic | State Impact |
|----------|------------------|-------------|--------------|
| **INFO** | None | N/A | None |
| **WARNING** | Automatic retry, degraded operation | 3 retries with exponential backoff | Continue in `WARNING` state |
| **ERROR** | Feature isolation, automatic retry | 3 retries, then manual intervention | Feature disabled, system continues |
| **FATAL** | Controlled teardown, recovery attempt | Single recovery attempt, then manual | `FAULT``TEARDOWN``INIT` |
### 6.2 Recovery Time Limits
| Fault Type | Maximum Recovery Time | Recovery Action |
|------------|----------------------|----------------|
| Sensor (WARNING) | 5 minutes | Automatic retry, sensor exclusion |
| Communication (WARNING) | 30 seconds | Automatic reconnection |
| Storage (WARNING) | 10 seconds | Retry write operation |
| Sensor (ERROR) | Manual intervention | Sensor marked as failed |
| Communication (ERROR) | Manual intervention | Communication feature disabled |
| Storage (ERROR) | Manual intervention | Persistence disabled, system continues |
| FATAL (any) | 60 seconds | Controlled teardown and recovery attempt |
### 6.3 Latching Behavior
| Severity | Latching Rule | Clear Condition |
|----------|--------------|----------------|
| **INFO** | Not latched | Overwritten by new event |
| **WARNING** | Latched until cleared | Fault condition cleared + manual clear OR automatic clear after 1 hour |
| **ERROR** | Latched until cleared | Manual clear via diagnostic session OR system reset |
| **FATAL** | Latched until cleared | Manual clear via diagnostic session OR system reset |
## 7. Fault Reporting
### 7.1 Reporting Channels
| Severity | Local HMI | Diagnostic Log | Main Hub | Diagnostic Session |
|----------|-----------|----------------|----------|-------------------|
| **INFO** | Optional | Yes | No | Yes |
| **WARNING** | Yes (status indicator) | Yes | Yes (periodic) | Yes |
| **ERROR** | Yes (status indicator) | Yes | Yes (immediate) | Yes |
| **FATAL** | Yes (status indicator) | Yes | Yes (immediate) | Yes |
### 7.2 Diagnostic Event Structure
```c
typedef struct {
uint32_t diagnostic_code; // Unique diagnostic code
diag_severity_t severity; // INFO, WARNING, ERROR, FATAL
uint64_t timestamp; // System timestamp (microseconds)
const char* source_component; // Component identifier
uint32_t occurrence_count; // Number of occurrences
bool is_latched; // Latching status
fault_category_t category; // SENSOR, COMMUNICATION, etc.
} diagnostic_event_t;
```
## 8. Integration with State Machine
### 8.1 Fault-to-State Mapping
| Fault Severity | Current State | Target State | Transition Trigger |
|----------------|---------------|--------------|-------------------|
| INFO | Any | Same | None (no state change) |
| WARNING | `RUNNING` | `WARNING` | First WARNING fault |
| WARNING | `WARNING` | `WARNING` | Additional WARNING (latched) |
| ERROR | `RUNNING` | `RUNNING` | Feature isolation, continue |
| ERROR | `WARNING` | `WARNING` | Feature isolation, continue |
| FATAL | `RUNNING` | `FAULT` | First FATAL fault |
| FATAL | `WARNING` | `FAULT` | Escalation to FATAL |
| FATAL | `FAULT` | `FAULT` | Additional FATAL (latched) |
### 8.2 State-Dependent Fault Handling
| State | Fault Handling Behavior |
|-------|------------------------|
| `INIT` | Boot-time faults → `BOOT_FAILURE` if security-related |
| `RUNNING` | Full fault detection and handling |
| `WARNING` | Fault escalation monitoring, recovery attempts |
| `FAULT` | Fault logging only, recovery attempt preparation |
| `OTA_PREP` | OTA-related faults only, others deferred |
| `OTA_UPDATE` | OTA progress faults only |
| `TEARDOWN` | Fault logging only, no new fault detection |
| `SERVICE` | Fault inspection only, no new fault detection |
## 9. Error Handler Responsibilities
The Error Handler component SHALL:
1. Receive fault reports from all components
2. Classify faults according to taxonomy
3. Determine severity and escalation
4. Trigger state transitions when required
5. Manage fault latching and clearing
6. Coordinate recovery attempts
7. Report faults to diagnostics and Main Hub
## 10. Traceability
- **SR-DIAG-001:** Implemented via diagnostic code framework
- **SR-DIAG-002:** Implemented via unique diagnostic code assignment
- **SR-DIAG-003:** Implemented via severity classification
- **SR-DIAG-004:** Implemented via timestamp and source association
- **SR-SYS-002:** Implemented via fault-to-state mapping
- **SR-SYS-004:** Implemented via FATAL fault → TEARDOWN transition
## 11. Mermaid Fault Escalation Diagram
```mermaid
flowchart TD
FaultDetected[Fault Detected] --> ClassifySeverity{Classify Severity}
ClassifySeverity -->|INFO| LogOnly[Log Only]
ClassifySeverity -->|WARNING| CheckState1{Current State?}
ClassifySeverity -->|ERROR| IsolateFeature[Isolate Feature]
ClassifySeverity -->|FATAL| TriggerFaultState[Trigger FAULT State]
CheckState1 -->|RUNNING| TransitionWarning[Transition to WARNING]
CheckState1 -->|WARNING| LatchWarning[Latch Warning]
IsolateFeature --> RetryLogic{Retry Logic}
RetryLogic -->|Success| ClearError[Clear Error]
RetryLogic -->|Failure| EscalateToFatal{Escalate?}
EscalateToFatal -->|Yes| TriggerFaultState
EscalateToFatal -->|No| ManualIntervention[Manual Intervention]
TriggerFaultState --> TeardownSequence[Initiate Teardown]
TeardownSequence --> RecoveryAttempt{Recovery Attempt}
RecoveryAttempt -->|Success| ResetToInit[Reset to INIT]
RecoveryAttempt -->|Failure| ManualIntervention
```

View File

@@ -0,0 +1,304 @@
# System Review Checklist
**Project:** Sensor Hub Poultry Farm Automation
**Scope:** Sensor Hub (Sub-Hub Only)
**Purpose:** Verify system readiness before FRD/SAD generation and AI-assisted implementation
## 1\. Requirements Completeness Review
### 1.1 Feature Coverage
✔ All major functional domains defined:
* ☐ Data Acquisition (DAQ)
* ☐ Data Quality &amp; Calibration (DQC)
* ☐ Communication (COM)
* ☐ Diagnostics &amp; Health (DIAG)
* ☐ Persistence &amp; Data Management (DATA)
* ☐ OTA Update (OTA)
* ☐ Security &amp; Safety (SEC)
* ☐ System Management &amp; HMI (SYS)
**Acceptance Criteria:**
No functional behavior is undocumented or implicit.
### 1.2 Requirement Quality
For **each system requirement**, verify:
* ☐ Uses “SHALL”
* ☐ Is testable
* ☐ Is unambiguous
* ☐ Has a unique ID
* ☐ Is traceable to a feature
**Red Flags:**
* Vague timing (“fast”, “real-time”)
* Mixed requirements (“shall… and …”)
* Implementation leakage (“using mutex”)
## 2\. Architectural Soundness Review
### 2.1 Layering &amp; Separation of Concerns
* ☐ Hardware access isolated
* ☐ No feature bypasses System Manager
* ☐ Persistence accessed only via DP
* ☐ HMI does not modify safety-critical configuration
**Fail Condition:**
Any feature directly accesses hardware or storage without abstraction.
### 2.2 State Machine Validity
* ☐ All system states defined
* ☐ Valid transitions documented
* ☐ Illegal transitions blocked
* ☐ Feature behavior defined per state
**States to Verify:**
* INIT
* IDLE
* RUNNING
* DEGRADED
* OTA\_UPDATE
* TEARDOWN
* ERROR
## 3\. Cross-Feature Constraint Compliance
### 3.1 Constraint Awareness
* ☐ Each feature respects cross-feature constraints
* ☐ No constraint contradicts a requirement
* ☐ Constraints are globally enforceable
### 3.2 Conflict Resolution
Check for conflicts such as:
* ☐ OTA vs DAQ timing
* ☐ Persistence vs Power Loss
* ☐ Diagnostics vs Real-Time Tasks
* ☐ Debug vs Secure Boot
**Acceptance:**
Conflicts resolved via priority rules or system state restrictions.
## 4\. Timing &amp; Performance Review
### 4.1 Real-Time Constraints
* ☐ High-frequency sampling bounded
* ☐ Worst-case execution time considered
* ☐ Non-blocking design enforced
### 4.2 Resource Usage
* ☐ CPU usage bounded
* ☐ RAM usage predictable
* ☐ Stack sizes justified
* ☐ Heap usage minimized in runtime
## 5\. Reliability &amp; Fault Handling Review
### 5.1 Fault Detection
* ☐ Sensor failure detection defined
* ☐ Communication failure detection defined
* ☐ Storage failure detection defined
### 5.2 Fault Response
* ☐ Graceful degradation defined
* ☐ Diagnostics logged
* ☐ System state updated appropriately
## 6\. Security Review
### 6.1 Boot &amp; Firmware
* ☐ Secure boot enforced
* ☐ Firmware integrity verified
* ☐ Rollback prevention defined
### 6.2 Communication
* ☐ Encryption mandatory
* ☐ Authentication required
* ☐ Key management strategy defined
### 6.3 Debug Access
* ☐ Debug sessions authenticated
* ☐ Debug disabled in production unless authorized
* ☐ Debug cannot bypass security or safety
## 7\. Data Management Review
### 7.1 Data Ownership
* ☐ Single source of truth enforced
* ☐ Clear ownership per data type
* ☐ No duplicated persistent data
### 7.2 Persistence Safety
* ☐ Safe writes during state transitions
* ☐ Power-loss tolerance defined
* ☐ Data recovery defined
## 8\. HMI &amp; Usability Review (OLED + Buttons)
### 8.1 Display Content
* ☐ Connectivity status
* ☐ System status
* ☐ Connected sensors
* ☐ Time &amp; date
### 8.2 Navigation Logic
* ☐ Menu hierarchy defined
* ☐ Button behavior consistent
* ☐ No destructive action via UI
## 9\. Standards &amp; Compliance Readiness
### 9.1 IEC 61499 Mapping Readiness
* ☐ Functional blocks identifiable
* ☐ Event/data separation respected
* ☐ Distributed execution possible
### 9.2 ISA-95 Alignment Readiness
* ☐ Sensor Hub maps to Level 1/2
* ☐ Clear boundary to Level 3/4
* ☐ No business logic leakage
## 10\. AI Readiness Review
### 10.1 Prompt Compatibility
* ☐ Features modular
* ☐ Requirements structured
* ☐ Architecture explicit
### 10.2 Tool Handoff Readiness
* ☐ Claude can generate FRD/SAD
* ☐ Mermaid diagrams derivable
* ☐ DeepSeek can critique logic
* ☐ Cursor rules enforce constraints
## Final Gate Decision
### GO / NO-GO Criteria
**GO** if:
* All sections ≥ 90% checked
* No critical architectural violation
* Security constraints enforced
**NO-GO** if:
* Missing system states
* Undefined failure behavior
* Security gaps
* Persistence inconsistency

View File

@@ -0,0 +1,314 @@
# System State Machine Specification
**Document Type:** Normative System Specification
**Scope:** Sensor Hub (Sub-Hub) Operational States
**Traceability:** SR-SYS-001, SR-SYS-002, SR-SYS-003
## 1. Purpose
This document defines the complete finite state machine (FSM) governing the Sensor Hub's operational lifecycle. All system components SHALL respect state-based operation restrictions as defined herein.
## 2. State Definitions
### 2.1 State Enumeration
| State ID | State Name | Description | Entry Condition |
|----------|------------|-------------|-----------------|
| `INIT` | Initialization | Hardware and software initialization phase | Power-on, reset, or post-teardown |
| `BOOT_FAILURE` | Boot Failure | Secure boot verification failed | Secure boot check failure during INIT |
| `RUNNING` | Normal Operation | Active sensor acquisition and communication | Successful initialization |
| `WARNING` | Degraded Operation | Non-fatal fault detected, degraded functionality | Non-critical fault detected during RUNNING |
| `FAULT` | Fatal Error | Critical fault, core functionality disabled | Fatal error or cascading failures |
| `OTA_PREP` | OTA Preparation | Preparing for firmware update | OTA request accepted, validation pending |
| `OTA_UPDATE` | OTA Update Active | Firmware update in progress | Firmware transfer and flashing |
| `MC_UPDATE` | Machine Constants Update | Machine constants update in progress | MC update request accepted |
| `TEARDOWN` | Controlled Shutdown | Safe shutdown sequence execution | Update, fault recovery, or manual command |
| `SERVICE` | Service Mode | Engineering/diagnostic interaction | Debug session active |
| `SD_DEGRADED` | SD Card Degraded | SD card failure detected, fallback mode | SD card access failure |
### 2.2 State Characteristics
#### INIT
- **Duration:** Bounded (max 5 seconds)
- **Allowed Operations:** Hardware initialization, secure boot verification, MC loading
- **Forbidden Operations:** Sensor acquisition, communication, persistence writes
- **Exit Conditions:** Success → RUNNING, Secure boot failure → BOOT_FAILURE
#### BOOT_FAILURE
- **Duration:** Indefinite (requires manual intervention)
- **Allowed Operations:** Diagnostic reporting, secure boot retry (limited)
- **Forbidden Operations:** All application features
- **Exit Conditions:** Manual reset, secure boot success → INIT
#### RUNNING
- **Duration:** Indefinite (normal operation)
- **Allowed Operations:** All features (DAQ, DQC, COM, DIAG, DATA, HMI)
- **Forbidden Operations:** OTA, MC update (must transition via TEARDOWN)
- **Exit Conditions:** Fault → WARNING/FAULT, OTA request → OTA_PREP, MC update → MC_UPDATE, Debug session → SERVICE
#### WARNING
- **Duration:** Until fault cleared or escalated
- **Allowed Operations:** Degraded DAQ, COM, DIAG (limited), DATA (read-only)
- **Forbidden Operations:** OTA, MC update
- **Exit Conditions:** Fault cleared → RUNNING, Fault escalated → FAULT
#### FAULT
- **Duration:** Until recovery attempt or manual intervention
- **Allowed Operations:** Diagnostic reporting, error logging, controlled teardown
- **Forbidden Operations:** Sensor acquisition, communication (except diagnostics)
- **Exit Conditions:** Recovery attempt → TEARDOWN, Manual reset → INIT
#### OTA_PREP
- **Duration:** Bounded (max 2 seconds)
- **Allowed Operations:** OTA readiness validation, teardown initiation
- **Forbidden Operations:** Sensor acquisition, new communication sessions
- **Exit Conditions:** Ready → TEARDOWN, Rejected → RUNNING
#### OTA_UPDATE
- **Duration:** Bounded (max 10 minutes)
- **Allowed Operations:** Firmware reception, validation, flashing
- **Forbidden Operations:** Sensor acquisition, normal communication, persistence (except OTA data)
- **Exit Conditions:** Success → RUNNING (after reboot), Failure → FAULT
#### MC_UPDATE
- **Duration:** Bounded (max 30 seconds)
- **Allowed Operations:** MC reception, validation, teardown
- **Forbidden Operations:** Sensor acquisition, normal communication
- **Exit Conditions:** Success → TEARDOWN, Failure → RUNNING
#### TEARDOWN
- **Duration:** Bounded (max 500ms)
- **Allowed Operations:** Data flush, resource release, state persistence
- **Forbidden Operations:** New sensor acquisition, new communication sessions
- **Exit Conditions:** Complete → INIT (reset), OTA → OTA_UPDATE, MC → MC_UPDATE
#### SERVICE
- **Duration:** Until session closed
- **Allowed Operations:** Diagnostic access, read-only inspection, controlled commands
- **Forbidden Operations:** Sensor acquisition (may be paused), OTA, MC update
- **Exit Conditions:** Session closed → RUNNING
#### SD_DEGRADED
- **Duration:** Until SD recovery or manual intervention
- **Allowed Operations:** Sensor acquisition (no persistence), communication, diagnostics
- **Forbidden Operations:** Persistence writes (except critical diagnostics)
- **Exit Conditions:** SD recovery → RUNNING, Manual intervention → SERVICE
## 3. State Transition Table
| From State | To State | Trigger | Guard Condition | Action | Authorized Caller |
|------------|----------|---------|-----------------|--------|------------------|
| `[*]` | `INIT` | Power-on, Reset | None | Initialize hardware, secure boot check | System |
| `INIT` | `RUNNING` | Init success | Secure boot OK, MC loaded, sensors detected | Start DAQ, COM, DIAG tasks | System Manager |
| `INIT` | `BOOT_FAILURE` | Secure boot fail | Secure boot verification failed | Log security fault, disable application | Secure Boot |
| `BOOT_FAILURE` | `INIT` | Manual reset | None | Reset system | User/Engineer |
| `RUNNING` | `WARNING` | Non-fatal fault | Diagnostic severity = WARNING | Degrade functionality, notify | Error Handler |
| `RUNNING` | `FAULT` | Fatal fault | Diagnostic severity = FATAL | Stop critical features | Error Handler |
| `RUNNING` | `OTA_PREP` | OTA request | OTA request received, system ready | Validate readiness | OTA Manager |
| `RUNNING` | `MC_UPDATE` | MC update request | MC update received, authenticated | Validate MC | MC Manager |
| `RUNNING` | `SERVICE` | Debug session | Debug session authenticated | Pause non-critical tasks | Debug Manager |
| `RUNNING` | `SD_DEGRADED` | SD failure | SD card access failure detected | Disable persistence writes | Persistence |
| `WARNING` | `RUNNING` | Fault cleared | Diagnostic cleared, system healthy | Restore full functionality | Error Handler |
| `WARNING` | `FAULT` | Fault escalated | Multiple warnings or critical fault | Stop degraded features | Error Handler |
| `FAULT` | `TEARDOWN` | Recovery attempt | Recovery command received | Initiate controlled shutdown | System Manager |
| `OTA_PREP` | `TEARDOWN` | OTA ready | Readiness validated | Begin teardown | OTA Manager |
| `OTA_PREP` | `RUNNING` | OTA rejected | Readiness check failed | Resume normal operation | OTA Manager |
| `TEARDOWN` | `OTA_UPDATE` | Teardown complete (OTA) | OTA pending, data flushed | Enter OTA state | System Manager |
| `TEARDOWN` | `MC_UPDATE` | Teardown complete (MC) | MC update pending, data flushed | Enter MC update | System Manager |
| `TEARDOWN` | `INIT` | Teardown complete (reset) | Reset requested, data flushed | Reset system | System Manager |
| `OTA_UPDATE` | `RUNNING` | OTA success | Firmware flashed, validated | Reboot into new firmware | OTA Manager |
| `OTA_UPDATE` | `FAULT` | OTA failure | Firmware validation failed | Log error, enter fault | OTA Manager |
| `MC_UPDATE` | `TEARDOWN` | MC update complete | MC validated, applied | Reinitialize system | MC Manager |
| `SERVICE` | `RUNNING` | Session closed | Debug session terminated | Resume normal operation | Debug Manager |
| `SD_DEGRADED` | `RUNNING` | SD recovered | SD card access restored | Re-enable persistence | Persistence |
| `SD_DEGRADED` | `SERVICE` | Manual intervention | User intervention required | Enter service mode | User/Engineer |
## 4. Per-State Feature Execution Rules
### 4.1 DAQ (Data Acquisition) Feature
| State | Allowed Operations | Restrictions |
|-------|-------------------|--------------|
| `INIT` | None | Sensor initialization only |
| `RUNNING` | Full acquisition cycle | None |
| `WARNING` | Degraded acquisition (reduced frequency) | Failed sensors excluded |
| `FAULT` | None | Acquisition stopped |
| `OTA_PREP` | None | Acquisition stopped |
| `OTA_UPDATE` | None | Acquisition stopped |
| `MC_UPDATE` | None | Acquisition stopped |
| `TEARDOWN` | None | Acquisition stopped |
| `SERVICE` | Paused (optional read-only) | No new samples |
| `SD_DEGRADED` | Full acquisition | Data not persisted |
| `BOOT_FAILURE` | None | Not applicable |
### 4.2 DQC (Data Quality & Calibration) Feature
| State | Allowed Operations | Restrictions |
|-------|-------------------|--------------|
| `INIT` | Sensor detection, MC loading | No calibration |
| `RUNNING` | Full quality checks, calibration | None |
| `WARNING` | Degraded quality checks | Reduced validation |
| `FAULT` | Error reporting only | No quality checks |
| `OTA_PREP` | None | Quality checks stopped |
| `OTA_UPDATE` | None | Quality checks stopped |
| `MC_UPDATE` | MC validation only | No sensor calibration |
| `TEARDOWN` | None | Quality checks stopped |
| `SERVICE` | Read-only inspection | No calibration |
| `SD_DEGRADED` | Full quality checks | Results not persisted |
| `BOOT_FAILURE` | None | Not applicable |
### 4.3 COM (Communication) Feature
| State | Allowed Operations | Restrictions |
|-------|-------------------|--------------|
| `INIT` | None | No communication |
| `RUNNING` | Full bidirectional communication | None |
| `WARNING` | Limited communication (diagnostics only) | Reduced bandwidth |
| `FAULT` | Diagnostic reporting only | No data transmission |
| `OTA_PREP` | OTA negotiation only | No other communication |
| `OTA_UPDATE` | OTA data transfer only | No other communication |
| `MC_UPDATE` | MC transfer only | No other communication |
| `TEARDOWN` | Session closure only | No new sessions |
| `SERVICE` | Debug session communication | No Main Hub communication |
| `SD_DEGRADED` | Full communication | Data not persisted |
| `BOOT_FAILURE` | Diagnostic reporting only | Limited communication |
### 4.4 DIAG (Diagnostics) Feature
| State | Allowed Operations | Restrictions |
|-------|-------------------|--------------|
| `INIT` | Boot diagnostics | Limited logging |
| `RUNNING` | Full diagnostics | None |
| `WARNING` | Full diagnostics | None |
| `FAULT` | Full diagnostics | None |
| `OTA_PREP` | OTA diagnostics | Limited scope |
| `OTA_UPDATE` | OTA progress diagnostics | Limited scope |
| `MC_UPDATE` | MC update diagnostics | Limited scope |
| `TEARDOWN` | Teardown diagnostics | Limited scope |
| `SERVICE` | Full diagnostics (read access) | No new diagnostics |
| `SD_DEGRADED` | Full diagnostics | Persistence limited |
| `BOOT_FAILURE` | Security diagnostics | Limited scope |
### 4.5 DATA (Persistence) Feature
| State | Allowed Operations | Restrictions |
|-------|-------------------|--------------|
| `INIT` | MC loading only | No writes |
| `RUNNING` | Full persistence | None |
| `WARNING` | Read-only, critical writes | Limited writes |
| `FAULT` | Critical diagnostics only | No sensor data writes |
| `OTA_PREP` | Read-only | No writes |
| `OTA_UPDATE` | OTA data only | No sensor data writes |
| `MC_UPDATE` | MC writes only | No sensor data writes |
| `TEARDOWN` | Critical data flush only | Authorized writes only |
| `SERVICE` | Read-only | No writes |
| `SD_DEGRADED` | Read-only (if possible) | No writes |
| `BOOT_FAILURE` | None | Not applicable |
### 4.6 OTA Feature
| State | Allowed Operations | Restrictions |
|-------|-------------------|--------------|
| `INIT` | None | OTA not active |
| `RUNNING` | OTA negotiation only | No transfer |
| `WARNING` | None | OTA blocked |
| `FAULT` | None | OTA blocked |
| `OTA_PREP` | Readiness validation | No transfer |
| `OTA_UPDATE` | Full OTA operations | None |
| `MC_UPDATE` | None | OTA blocked |
| `TEARDOWN` | None | OTA blocked |
| `SERVICE` | None | OTA blocked |
| `SD_DEGRADED` | None | OTA blocked |
| `BOOT_FAILURE` | None | OTA blocked |
### 4.7 SEC (Security) Feature
| State | Allowed Operations | Restrictions |
|-------|-------------------|--------------|
| `INIT` | Secure boot verification | Must complete before app start |
| `RUNNING` | Full security (encryption, authentication) | None |
| `WARNING` | Full security | None |
| `FAULT` | Security diagnostics | Limited operations |
| `OTA_PREP` | OTA authentication | None |
| `OTA_UPDATE` | Firmware verification | None |
| `MC_UPDATE` | MC authentication | None |
| `TEARDOWN` | Key protection | None |
| `SERVICE` | Debug authentication | None |
| `SD_DEGRADED` | Full security | None |
| `BOOT_FAILURE` | Security fault handling | Limited operations |
### 4.8 SYS (System Management) Feature
| State | Allowed Operations | Restrictions |
|-------|-------------------|--------------|
| `INIT` | State management, initialization | Limited operations |
| `RUNNING` | Full system management | None |
| `WARNING` | Degraded management | Limited operations |
| `FAULT` | Fault recovery management | Limited operations |
| `OTA_PREP` | OTA state management | Limited operations |
| `OTA_UPDATE` | OTA state management | Limited operations |
| `MC_UPDATE` | MC state management | Limited operations |
| `TEARDOWN` | Teardown execution | Limited operations |
| `SERVICE` | Service mode management | Limited operations |
| `SD_DEGRADED` | Degraded management | Limited operations |
| `BOOT_FAILURE` | Boot failure management | Limited operations |
## 5. State Transition Timing Requirements
| Transition | Maximum Duration | Justification |
|------------|------------------|---------------|
| `[*]``INIT` | 100ms | Power-on initialization |
| `INIT``RUNNING` | 5s | Hardware init, secure boot, MC load |
| `INIT``BOOT_FAILURE` | 2s | Secure boot verification |
| `RUNNING``WARNING` | 50ms | Fault detection and state change |
| `RUNNING``FAULT` | 50ms | Critical fault detection |
| `RUNNING``OTA_PREP` | 100ms | OTA request processing |
| `OTA_PREP``TEARDOWN` | 2s | Readiness validation |
| `TEARDOWN``OTA_UPDATE` | 500ms | Data flush and resource release |
| `TEARDOWN``INIT` | 500ms | Data flush and reset |
| `OTA_UPDATE``RUNNING` | 10 minutes | Firmware transfer and flashing |
| `RUNNING``SERVICE` | 100ms | Debug session establishment |
| `SERVICE``RUNNING` | 50ms | Debug session closure |
## 6. State Notification Mechanism
All state transitions SHALL notify registered components via the Event System:
- **Event Type:** `SYSTEM_STATE_CHANGED`
- **Payload:** Previous state, new state, transition reason
- **Subscribers:** All feature managers (DAQ, DQC, COM, DIAG, DATA, OTA, SEC, SYS)
## 7. Traceability
- **SR-SYS-001:** Implemented via complete FSM definition
- **SR-SYS-002:** Implemented via per-state feature execution rules
- **SR-SYS-003:** Implemented via state notification mechanism
## 8. Mermaid State Diagram
```mermaid
stateDiagram-v2
[*] --> INIT
INIT --> RUNNING: initSuccess
INIT --> BOOT_FAILURE: secureBootFail
BOOT_FAILURE --> INIT: manualReset
RUNNING --> WARNING: nonFatalFault
RUNNING --> FAULT: fatalFault
RUNNING --> OTA_PREP: otaRequest
RUNNING --> MC_UPDATE: mcUpdateRequest
RUNNING --> SERVICE: debugSession
RUNNING --> SD_DEGRADED: sdFailure
WARNING --> RUNNING: faultCleared
WARNING --> FAULT: faultEscalated
FAULT --> TEARDOWN: recoveryAttempt
OTA_PREP --> TEARDOWN: otaReady
OTA_PREP --> RUNNING: otaRejected
TEARDOWN --> OTA_UPDATE: otaPending
TEARDOWN --> MC_UPDATE: mcPending
TEARDOWN --> INIT: resetRequested
OTA_UPDATE --> RUNNING: otaSuccess
OTA_UPDATE --> FAULT: otaFailure
MC_UPDATE --> TEARDOWN: mcComplete
SERVICE --> RUNNING: sessionClosed
SD_DEGRADED --> RUNNING: sdRecovered
SD_DEGRADED --> SERVICE: manualIntervention
```