components

This commit is contained in:
2026-02-01 23:37:00 +01:00
parent 304371c6b8
commit 80236fa840
11 changed files with 5501 additions and 0 deletions

View File

@@ -0,0 +1,527 @@
# Diagnostics Manager Component Specification
**Component ID:** C-DIAG-001
**Component Name:** Diagnostics Manager
**Version:** 1.0
**Date:** 2025-02-01
## 1. Component Overview
### 1.1 Purpose
The Diagnostics Manager is responsible for comprehensive system health monitoring, fault detection, diagnostic data collection, and engineering access capabilities. It provides centralized diagnostic code management, persistent diagnostic data storage, and diagnostic session support for maintenance and troubleshooting.
### 1.2 Scope
- Diagnostic code framework and management
- System health monitoring and fault detection
- Diagnostic data collection and storage
- Engineering diagnostic sessions
- Layered watchdog system management
- Performance and resource monitoring
### 1.3 Responsibilities
- Implement structured diagnostic code framework
- Collect and classify diagnostic events
- Persist diagnostic data across system resets
- Provide diagnostic session interface for engineers
- Monitor system health and performance metrics
- Manage watchdog systems for fault detection
- Generate diagnostic reports and summaries
## 2. Component Architecture
### 2.1 Static View
```mermaid
graph TB
subgraph "Diagnostics Manager"
DM[Diagnostic Controller]
DC[Diagnostic Collector]
DR[Diagnostic Reporter]
DS[Diagnostic Session]
HM[Health Monitor]
WM[Watchdog Manager]
end
subgraph "Storage Layer"
DL[Diagnostic Logger]
DP[Data Pool]
end
subgraph "Hardware Monitoring"
TWD[Task Watchdog]
IWD[Interrupt Watchdog]
RWD[RTC Watchdog]
TM[Temperature Monitor]
VM[Voltage Monitor]
end
DM --> DC
DM --> DR
DM --> DS
DM --> HM
DM --> WM
DC --> DL
DR --> DP
HM --> TM
HM --> VM
WM --> TWD
WM --> IWD
WM --> RWD
```
### 2.2 Internal Components
#### 2.2.1 Diagnostic Controller
- Central coordination of diagnostic activities
- Diagnostic event routing and processing
- Diagnostic policy enforcement
#### 2.2.2 Diagnostic Collector
- Diagnostic event collection and enrichment
- Timestamp and context information addition
- Diagnostic code validation and assignment
#### 2.2.3 Diagnostic Reporter
- Diagnostic event reporting to external systems
- Diagnostic summary generation
- Real-time diagnostic notifications
#### 2.2.4 Diagnostic Session
- Engineering access interface
- Diagnostic data retrieval and analysis
- Diagnostic record management
#### 2.2.5 Health Monitor
- System vital signs monitoring
- Performance metrics collection
- Resource usage tracking
#### 2.2.6 Watchdog Manager
- Multi-layer watchdog system management
- Watchdog feeding coordination
- Watchdog timeout handling
## 3. Interfaces
### 3.1 Provided Interfaces
#### 3.1.1 IDiagnosticsManager
```cpp
class IDiagnosticsManager {
public:
virtual ~IDiagnosticsManager() = default;
// Diagnostic Event Management
virtual Result<void> reportDiagnostic(DiagnosticCode code,
DiagnosticSeverity severity,
const std::string& context) = 0;
virtual Result<void> reportDiagnostic(const DiagnosticEvent& event) = 0;
virtual Result<void> clearDiagnostic(DiagnosticCode code) = 0;
virtual Result<void> clearAllDiagnostics() = 0;
// Diagnostic Query
virtual std::vector<DiagnosticEvent> getActiveDiagnostics() const = 0;
virtual std::vector<DiagnosticEvent> getDiagnosticHistory(
std::chrono::system_clock::time_point since) const = 0;
virtual DiagnosticSummary getDiagnosticSummary() const = 0;
// Health Monitoring
virtual SystemHealth getSystemHealth() const = 0;
virtual PerformanceMetrics getPerformanceMetrics() const = 0;
virtual ResourceUsage getResourceUsage() const = 0;
// Session Management
virtual Result<DiagnosticSessionId> createDiagnosticSession(
const SessionCredentials& credentials) = 0;
virtual Result<void> closeDiagnosticSession(DiagnosticSessionId session) = 0;
};
```
#### 3.1.2 IDiagnosticReporter
```cpp
class IDiagnosticReporter {
public:
virtual ~IDiagnosticReporter() = default;
virtual Result<void> reportEvent(const DiagnosticEvent& event) = 0;
virtual Result<void> reportHealthStatus(const SystemHealth& health) = 0;
virtual Result<void> reportPerformanceMetrics(const PerformanceMetrics& metrics) = 0;
};
```
#### 3.1.3 IHealthMonitor
```cpp
class IHealthMonitor {
public:
virtual ~IHealthMonitor() = default;
virtual SystemHealth getCurrentHealth() const = 0;
virtual void startHealthMonitoring() = 0;
virtual void stopHealthMonitoring() = 0;
virtual Result<void> registerHealthCallback(IHealthCallback* callback) = 0;
};
```
### 3.2 Required Interfaces
#### 3.2.1 IPersistenceManager
- Persistent storage of diagnostic events
- Diagnostic data retrieval and querying
- Storage space management
#### 3.2.2 IEventSystem
- Event publication for diagnostic notifications
- Event subscription for system events
- Asynchronous event handling
#### 3.2.3 ISecurityManager
- Diagnostic session authentication
- Access control for diagnostic operations
- Security violation reporting
#### 3.2.4 ISystemStateManager
- System state information for diagnostics
- State change notifications
- System health correlation
## 4. Dynamic View
### 4.1 Diagnostic Event Processing Sequence
```mermaid
sequenceDiagram
participant COMP as System Component
participant DM as Diagnostics Manager
participant DC as Diagnostic Collector
participant DL as Diagnostic Logger
participant DR as Diagnostic Reporter
participant ES as Event System
COMP->>DM: reportDiagnostic(code, severity, context)
DM->>DC: collectDiagnosticData(code, severity, context)
DC->>DC: enrichDiagnostic(timestamp, source, details)
DC->>DL: persistDiagnostic(diagnostic_event)
DL-->>DC: persistence_result
DC->>DR: reportDiagnosticEvent(diagnostic_event)
DR->>ES: publishDiagnosticEvent(event)
alt Critical Diagnostic
DM->>DM: triggerEmergencyAction()
DM->>ES: publishCriticalAlert(event)
end
```
### 4.2 Health Monitoring Sequence
```mermaid
sequenceDiagram
participant HM as Health Monitor
participant TM as Temperature Monitor
participant VM as Voltage Monitor
participant WM as Watchdog Manager
participant DM as Diagnostics Manager
loop Health Check Cycle (1 second)
HM->>TM: getTemperature()
TM-->>HM: temperature_value
HM->>VM: getVoltage()
VM-->>HM: voltage_value
HM->>WM: getWatchdogStatus()
WM-->>HM: watchdog_status
HM->>HM: analyzeHealthMetrics()
alt Health Issue Detected
HM->>DM: reportHealthIssue(issue_type, severity)
end
HM->>HM: updateHealthStatus()
end
```
### 4.3 Diagnostic Session Sequence
```mermaid
sequenceDiagram
participant ENG as Engineer
participant DS as Diagnostic Session
participant DM as Diagnostics Manager
participant SM as Security Manager
participant DL as Diagnostic Logger
ENG->>DS: requestDiagnosticSession(credentials)
DS->>SM: authenticateUser(credentials)
SM-->>DS: authentication_result
alt Authentication Success
DS->>DM: createSession(user_id, permissions)
DM-->>DS: session_id
DS-->>ENG: session_established(session_id)
ENG->>DS: getDiagnosticSummary()
DS->>DM: getDiagnosticSummary()
DM-->>DS: diagnostic_summary
DS-->>ENG: summary_data
ENG->>DS: retrieveDiagnostics(filter)
DS->>DL: queryDiagnostics(filter)
DL-->>DS: diagnostic_records
DS-->>ENG: diagnostic_data
ENG->>DS: clearDiagnostics(codes)
DS->>DM: clearDiagnosticCodes(codes)
DM->>DL: removeDiagnostics(codes)
DL-->>DM: clear_result
DM-->>DS: operation_result
DS-->>ENG: operation_complete
else Authentication Failed
DS-->>ENG: access_denied
end
```
## 5. Diagnostic Code System
### 5.1 Diagnostic Code Structure
```cpp
struct DiagnosticCode {
uint16_t category; // System category (e.g., SENSOR, COMM, STORAGE)
uint16_t component; // Component identifier
uint16_t error; // Specific error code
// Example: SEN-TEMP-001 = 0x0101001
static constexpr uint16_t SENSOR_CATEGORY = 0x01;
static constexpr uint16_t TEMPERATURE_COMPONENT = 0x01;
static constexpr uint16_t SENSOR_FAILURE = 0x001;
};
```
### 5.2 Diagnostic Severity Levels
```cpp
enum class DiagnosticSeverity {
INFO = 0, // Informational messages
WARNING = 1, // Non-critical issues
ERROR = 2, // Recoverable errors
CRITICAL = 3, // System degradation
FATAL = 4 // System failure
};
```
### 5.3 Diagnostic Event Structure
```cpp
struct DiagnosticEvent {
DiagnosticCode code;
DiagnosticSeverity severity;
std::chrono::system_clock::time_point timestamp;
std::string source_component;
std::string context;
std::map<std::string, std::string> metadata;
uint32_t occurrence_count;
bool is_active;
};
```
## 6. Health Monitoring
### 6.1 System Health Metrics
```cpp
struct SystemHealth {
// Temperature Monitoring
float cpu_temperature_celsius;
bool temperature_warning;
bool temperature_critical;
// Memory Monitoring
size_t free_heap_bytes;
size_t min_free_heap_bytes;
float heap_usage_percentage;
// Storage Monitoring
size_t sd_card_free_bytes;
size_t sd_card_total_bytes;
bool sd_card_healthy;
// Communication Monitoring
bool main_hub_connected;
int wifi_signal_strength_dbm;
uint32_t communication_errors;
// Power Monitoring
float supply_voltage_v;
bool brownout_detected;
// Overall Health Status
HealthStatus overall_status;
};
enum class HealthStatus {
HEALTHY,
WARNING,
DEGRADED,
CRITICAL,
FAILED
};
```
### 6.2 Performance Metrics
```cpp
struct PerformanceMetrics {
// CPU Utilization
float cpu_utilization_percentage;
float max_cpu_utilization_percentage;
// Task Performance
std::map<std::string, TaskMetrics> task_metrics;
// Communication Performance
uint32_t messages_sent_per_minute;
uint32_t messages_received_per_minute;
std::chrono::milliseconds average_response_time;
// Sensor Performance
uint32_t sensor_readings_per_minute;
std::chrono::milliseconds average_sensor_read_time;
// Storage Performance
uint32_t storage_writes_per_minute;
std::chrono::milliseconds average_write_time;
};
```
## 7. Watchdog System
### 7.1 Watchdog Configuration
```cpp
struct WatchdogConfig {
// Task Watchdog
bool task_watchdog_enabled;
std::chrono::seconds task_watchdog_timeout;
std::vector<std::string> monitored_tasks;
// Interrupt Watchdog
bool interrupt_watchdog_enabled;
std::chrono::seconds interrupt_watchdog_timeout;
// RTC Watchdog
bool rtc_watchdog_enabled;
std::chrono::seconds rtc_watchdog_timeout;
};
```
### 7.2 Watchdog Management
- **Task Watchdog**: Monitors FreeRTOS tasks for deadlocks (10s timeout)
- **Interrupt Watchdog**: Detects ISR hangs (3s timeout)
- **RTC Watchdog**: Final safety net for total system freeze (30s timeout)
## 8. Configuration
### 8.1 Diagnostics Configuration
```cpp
struct DiagnosticsConfig {
// Storage Configuration
size_t max_diagnostic_records;
std::chrono::hours diagnostic_retention_period;
bool persistent_storage_enabled;
// Health Monitoring Configuration
std::chrono::seconds health_check_interval;
HealthThresholds health_thresholds;
bool continuous_monitoring_enabled;
// Reporting Configuration
bool real_time_reporting_enabled;
DiagnosticSeverity min_reporting_severity;
std::chrono::seconds reporting_interval;
// Session Configuration
std::chrono::minutes session_timeout;
uint32_t max_concurrent_sessions;
bool remote_sessions_enabled;
};
```
## 9. Error Handling
### 9.1 Error Categories
- **Storage Errors**: Diagnostic persistence failures
- **Memory Errors**: Insufficient memory for diagnostic operations
- **Configuration Errors**: Invalid diagnostic configuration
- **Session Errors**: Authentication or authorization failures
- **Hardware Errors**: Sensor or monitoring hardware failures
### 9.2 Error Recovery Strategies
- **Graceful Degradation**: Continue operation with reduced diagnostic capability
- **Memory Management**: Implement diagnostic record rotation and cleanup
- **Fallback Storage**: Use alternative storage when primary fails
- **Self-Diagnostics**: Monitor diagnostic system health
## 10. Performance Characteristics
### 10.1 Timing Requirements
- **Diagnostic Event Processing**: < 10ms per event
- **Health Check Cycle**: 1 second interval
- **Diagnostic Query Response**: < 100ms for typical queries
- **Session Operations**: < 500ms for session establishment
### 10.2 Resource Usage
- **Memory**: < 16KB for diagnostic buffers and metadata
- **Storage**: Configurable with rotation (default 1MB)
- **CPU**: < 2% average utilization for monitoring
## 11. Security Considerations
### 11.1 Access Control
- Diagnostic session authentication required
- Role-based access to diagnostic operations
- Audit logging of diagnostic access and modifications
### 11.2 Data Protection
- Sensitive diagnostic data encryption
- Secure diagnostic data transmission
- Diagnostic data integrity verification
## 12. Testing Strategy
### 12.1 Unit Tests
- Diagnostic event processing and storage
- Health monitoring algorithms
- Watchdog management functionality
- Session management and authentication
### 12.2 Integration Tests
- End-to-end diagnostic reporting
- Health monitoring integration
- Diagnostic session workflows
- Cross-component diagnostic correlation
### 12.3 Hardware Tests
- Watchdog timeout and recovery testing
- Hardware monitoring accuracy
- Performance under stress conditions
## 13. Dependencies
### 13.1 Internal Dependencies
- Persistence Manager for diagnostic storage
- Event System for diagnostic notifications
- Security Manager for session authentication
- System State Manager for system context
### 13.2 External Dependencies
- ESP-IDF watchdog APIs
- FreeRTOS task monitoring
- Hardware monitoring peripherals
- File system for diagnostic storage
## 14. Constraints and Assumptions
### 14.1 Constraints
- Diagnostic system must remain operational during system faults
- Memory usage must be bounded and predictable
- Diagnostic operations must not interfere with real-time requirements
- Storage space for diagnostics is limited and requires rotation
### 14.2 Assumptions
- Sufficient system resources for diagnostic operations
- Reliable storage medium for diagnostic persistence
- Proper system time for diagnostic timestamping
- Valid security credentials for diagnostic sessions