Generate Real-Time Alters for Maintenance to Avoid Equipment Failures
Mikey Tabak, PhD • 16 July 2025
By recognizing equipment failure probabilities in real-time, before they occur, we can save money, reduce downtime, reduce waste, and prevent dangerous workplace condiitons.
In this demo we simulate some simple data that would be read from IoT sensors (temperature, pressure, and vibration) on equipment in the manufacturing process. We also simulate equipment failures associated with these features. We use the data to build a machine learning model that predicts equipment failures. This type of approach can be used in the manufacturing process to detect high probability failures before they occur.
from demo.maintenance_utils import EquipmentSimulator
env = EquipmentSimulator(n_days=50, seed=25)
df = env.simulate()
Look at some rows from the simulated data
df.head()
timestamp | temperature | pressure | vibration | failure | |
---|---|---|---|---|---|
0 | 2025-01-01 00:00:00 | 71.141365 | 30.987897 | 0.271128 | 0 |
1 | 2025-01-01 01:00:00 | 75.134452 | 30.152744 | 0.223104 | 0 |
2 | 2025-01-01 02:00:00 | 65.802076 | 28.975828 | 0.140646 | 0 |
3 | 2025-01-01 03:00:00 | 67.044092 | 30.344024 | 0.212248 | 0 |
4 | 2025-01-01 04:00:00 | 65.215559 | 29.541515 | 0.151307 | 0 |
num_failures = (df['failure'] == 1).sum()
print(f"We observed {num_failures} failures in our simulation")
We observed 10 failures in our simulation
This plot shows how each sensor — temperature, pressure, and vibration — changes over time.
Black X marks indicate when a failure occurred. We can use this to visually inspect whether failures are preceded by abnormal sensor behavior.
env.plot_time_series()
This pair plot shows the relationships between temperature, pressure, and vibration.
Each point represents a moment in time, colored by whether a failure occurred. Clusters of failure points can help identify regions in sensor space where the system is more likely to fail.
env.plot_pairwise_failures()
Given the above plot, there does not appear to be a clear set of conditions that lead to system failure. In these cases, machine learning models may be able to predict patterns of equipment failure when a human could not do so.
Here we zoom into a 48-hour window centered on the first failure.
This allows us to examine how the sensor readings change in the hours leading up to and after a failure event, which can inform feature engineering and model design.
env.plot_failure_window()
We begin by engineering features from the simulated sensor readings.
This includes lag features for temperature, pressure, and vibration, which allow the model to detect changes over time that may precede a failure. Notice that instead of one column for each feature, there is a column for each time lag for each feature.
# Prepare the features and labels
X, y = env.prepare_features()
We use a Random Forest classifier to predict equipment failures.
The model is trained on 80% of the data and evaluated on the remaining 20%. This gives us a sense of how well it performs on unseen data.
# Train the model
trained_model = env.train_model(lag=3)
Model trained successfully.
We now use the trained model to predict failures on the test set.
This plot shows the predicted failures alongside the actual failures so we can visually assess the model’s performance.
# Plot true vs predicted failures
env.predict_and_plot()
In this plot, the predicted equipment failures overlap with the observed failures. This indicates that our model is good at predicting equipment failures.
Now we simulate how the model could behave in a live environment.
For each new data point, the model makes a prediction and prints the result. This mimics how predictions would work in an operational setting. We can use real-time alerts: if the probability of failure is over 50%, we send an alert to perform maintenance and prevent a failure.
# Simulate real-time predictions with optional delay
env.stream_predictions(lag=3, threshold=0.5, delay=0.01)
Streaming predictions... -- Alert: High Probability of Failure -- Timestep: 0022 | Failure Probability: 0.68 -- Alert: High Probability of Failure -- Timestep: 0212 | Failure Probability: 0.83 -- Alert: High Probability of Failure -- Timestep: 0304 | Failure Probability: 0.68 -- Alert: High Probability of Failure -- Timestep: 0346 | Failure Probability: 0.70 -- Alert: High Probability of Failure -- Timestep: 0674 | Failure Probability: 0.82 -- Alert: High Probability of Failure -- Timestep: 0812 | Failure Probability: 0.64 -- Alert: High Probability of Failure -- Timestep: 0967 | Failure Probability: 0.74
By forecasting equipment failures before they occur, we can reduce unplanned downtime, cut maintenance costs, and improve both safety and operational efficiency in manufacturing environments. While this notebook used simple synthetic data, the same approach can be applied to complex real-world systems.
In production settings, IoT sensor data can be streamed directly into the model to trigger real-time alerts, prioritize inspections, or generate maintenance schedules. This is a simple example of how we can move from reactive to predictive maintenance in a scalable way.
From prototyping with synthetic data to deploying real-time monitoring systems, QSC helps businesses use AI to reduce downtime, extend equipment life, and optimize performance. Let’s work together to turn your sensor data into actionable insights.
Contact Us