Runtime Monitoring for Human-Robot Systems

Construction sites are among the most hazardous work environments. As robots are increasingly deployed alongside human workers to assist with tasks like material transport, bricklaying, and structural inspection, ensuring the safety of human-robot collaboration becomes paramount. Unlike controlled factory floors, construction sites are dynamic, unstructured, and unpredictable — making traditional safety measures insufficient. Runtime monitoring, the process of continuously observing a system during execution to verify that it satisfies specified safety properties, offers a principled approach to this challenge.

Why Runtime Monitoring?

Formal verification methods like model checking and theorem proving can guarantee system safety, but they operate on models of the system rather than the system itself. In the real world, the gap between model and reality — sensor noise, unexpected human behavior, environmental changes — means that pre-deployment verification alone cannot ensure safety during operation.

Runtime monitoring bridges this gap. It operates on the actual system at execution time, checking whether observed behavior conforms to formal safety specifications. If a violation is detected (or predicted), the monitor can trigger corrective actions — slowing the robot, rerouting it, or stopping it entirely.

For human-robot construction systems, safety specifications typically involve spatial constraints. For example:

Minimum separation distance: "The robot shall maintain at least 1.25 meters of distance from any human worker at all times."
Speed limits in proximity: "When a human is within 3 meters, the robot's speed shall not exceed 0.5 m/s."
Restricted zones: "The robot shall not enter the designated human work area during active operations."

These properties can be expressed in formal languages such as Signal Temporal Logic (STL), which allows specifying requirements over continuous signals with time bounds — making it well-suited for robotic systems operating in continuous physical spaces.

Perception-Based Monitoring

A runtime monitor is only as good as its perception of the world. In a construction environment, the monitor must answer a fundamental question in real time: where are the humans, and where is the robot?

Our approach uses a vision-based perception pipeline built on the following components:

Object Detection (YOLOv3): A real-time object detection neural network identifies human workers in RGB camera frames captured from the robot's onboard camera. YOLOv3 provides bounding boxes around detected persons at frame rates sufficient for real-time monitoring (30+ FPS on standard hardware).
Depth Estimation: Using an RGB-D (depth) camera, the system obtains 3D spatial information. The depth data within the detected bounding box is processed to estimate the distance between the robot and each detected human.
Point Cloud Clustering: Raw depth data is noisy. We apply clustering algorithms to the point cloud data within detected regions to obtain robust position estimates. This filtering step reduces the impact of sensor noise and partial occlusions.
Kalman Filtering: To smooth position estimates over time and predict short-term human motion trajectories, we employ a Kalman filter. This provides not just the current estimated position but also velocity and predicted future positions, enabling predictive (rather than purely reactive) safety monitoring.

From Perception to Verification

The perception pipeline produces a continuous stream of estimated human-robot distances and relative positions. The runtime monitor takes these signals and evaluates them against the formal safety specifications. This evaluation happens at every time step, producing one of three outcomes:

Satisfied: The current state and recent trajectory satisfy all safety properties. Normal operation continues.
Violated: A safety property has been breached. The monitor triggers an immediate response (e.g., emergency stop).
Warning: The system is approaching a boundary condition. Based on predicted trajectories, a violation may occur within a specified time horizon. The monitor can trigger preemptive action (e.g., slowing down, adjusting the path).

The use of Signal Temporal Logic allows quantitative evaluation — not just "is the property satisfied?" but "by how much?" This notion of robustness provides a continuous measure of how far the system is from violating a specification, enabling more nuanced control responses.

The PerM Tool

Building on this framework, we developed PerM (Perception-based Runtime Monitoring), a tool that integrates the perception pipeline with a formal runtime monitor for human-construction robot systems. PerM provides:

Real-time visualization of detected humans and their estimated positions relative to the robot
Continuous evaluation of user-defined STL safety specifications
Robustness scores indicating the margin of safety at each time step
Configurable alert thresholds and response actions

PerM was presented at the DAC 2024 Workshop and our broader work on perception-based runtime monitoring was published at ACM/IEEE MEMOCODE 2024, where it was recognized as a Best Paper Candidate.

Challenges and Lessons Learned

Developing and deploying runtime monitoring for real-world human-robot systems has revealed several important challenges:

Perception uncertainty: Neural network-based detectors can miss humans (false negatives) or hallucinate detections (false positives). A safety monitor must account for this uncertainty — potentially treating missed detections as dangerous rather than safe.
Occlusion: On busy construction sites, humans may be partially or fully occluded by equipment or structures. The monitor must reason about occluded regions and maintain safety even when humans are not visible.
Computational budget: Running object detection, depth processing, Kalman filtering, and formal specification checking in real time requires careful engineering. Each component must meet its latency budget to keep the overall pipeline responsive.
Specification design: Writing correct and complete formal specifications is itself a challenge. Overly conservative specifications lead to a robot that is too cautious to be useful; overly permissive specifications compromise safety. Iterative refinement through simulation and testing is essential.

Future Directions

Several exciting research directions extend this work. Incorporating multiple cameras or sensor modalities (LiDAR, UWB positioning) can improve perception coverage and reduce occlusion issues. Learning-based approaches could adapt safety margins based on the specific construction task and worker behavior patterns. And as construction robots become more autonomous, runtime monitoring will need to scale to multi-robot, multi-human scenarios with more complex spatial and temporal coordination requirements.

The ultimate goal is a framework where safety is not an afterthought but a continuously verified, formally grounded property of the system — enabling robots to be genuinely useful partners in hazardous environments while keeping human workers safe.