Equipment Predictive Maintenance Solution

Industry: Discrete Manufacturing / Process Industry
Scenario: Real-time factory equipment sensor data collection → Anomaly detection → AI maintenance recommendations
Tech Stack: Singdata Lakehouse · Dynamic Table · AI_COMPLETE · DeepSeek V3

An unplanned factory equipment shutdown costs an average of $260,000 per hour. This solution uses Singdata Lakehouse to implement a complete closed-loop pipeline in pure SQL — from sensor data ingestion to 1-hour rolling anomaly detection to AI-generated maintenance recommendations — with no standalone stream processing cluster and no AI service deployment required. Implementation time shrinks from months to days.


1. Business Background

The Industry 4.0 and smart manufacturing wave is driving factories to accelerate the shift from "reactive repair" to "predictive maintenance." Modern factory equipment is widely equipped with IoT sensors for vibration, temperature, pressure, and rotation speed. Each machine generates multi-dimensional data streams every second, laying the groundwork for data-driven failure prediction.

Key industry scenarios covered:

IndustryTypical EquipmentCore Monitoring Metrics
Automotive ManufacturingCNC machining centers, stamping presses, welding robotsSpindle temperature, vibration, power load
Electronics / SemiconductorSMT placement machines, lithography machines, vacuum pumpsTemperature uniformity, rotation speed, pressure
Injection Molding / RubberInjection molding machines, extrudersHydraulic pressure, barrel temperature, screw speed
Process ChemicalsReactors, compressors, heat exchangersTemperature, pressure, flow rate
Logistics / WarehousingConveyor belts, AGVs, palletizing robotsMotor temperature, vibration frequency, load current

2. Industry Pain Points

2.1 Unplanned Downtime Is Extremely Costly

  • Average cost of unplanned downtime in manufacturing: ~$260,000/hour (Aberdeen Research)
  • Automotive industry single downtime event can exceed $2.3 million/hour
  • Equipment failure is the leading cause of downtime events, accounting for ~44%, with an average duration of 238 minutes
  • Global manufacturers lose up to $852 million per week due to unplanned downtime

2.2 Three Key Flaws of Traditional Maintenance Models

Reactive Maintenance

  • Problems are addressed only after failure — losses have already occurred
  • Emergency spare parts procurement costs 3–5x more than planned procurement
  • High risk of cascading failures (spindle failure → entire production line shutdown)

Preventive Maintenance

  • Parts are replaced on fixed schedules; many components are discarded while still within their useful life
  • Approximately 30% of effective maintenance globally is unnecessary over-maintenance
  • Cannot handle early failure caused by abnormal operating conditions

Manual Inspection

  • Relies on individual experience with point-in-time sampling; high miss rate
  • Unable to perform multi-dimensional integrated analysis
  • Cannot provide 7×24-hour coverage of hundreds of machines

2.3 Data Silos and High Build Costs

  • Sensor data lands in SCADA/PLC systems and is disconnected from MES and ERP data — joint analysis is impossible
  • Historical failure data and maintenance logs are scattered across multiple systems with no traceability
  • Traditional predictive maintenance solutions require stacking multiple components (time-series database + stream processing + ML platform + AI inference service), resulting in long build cycles and complex operations
  • Modeling depends on dedicated data science teams; delivery takes months with poor ROI

This solution addresses all of the above pain points: Singdata Lakehouse unifies real-time computation, historical storage, and AI inference — the entire solution requires only SQL and depends on no external systems.


3. Solution

3.1 Overall Architecture

Predictive Maintenance Architecture

After IoT device sensor data enters the equipment_sensors raw table, two Dynamic Tables automatically perform continuous computation: the first computes 1-hour rolling averages (to suppress single-point spike false alarms), and the second applies multi-dimensional threshold filtering and risk classification. Finally, AI_COMPLETE is called for medium/high-risk equipment to generate structured maintenance recommendations, which are written to equipment_alerts for downstream consumption. The entire pipeline requires no external scheduler — the platform automatically detects data changes and performs incremental refresh.

3.2 Data Model

Raw data table equipment_sensors

-- The partition column must be the first field in the composite primary key (Singdata convention) CREATE TABLE equipment_sensors ( sensor_id STRING NOT NULL, equipment_id STRING NOT NULL COMMENT 'Equipment ID', equipment_type STRING COMMENT 'Equipment type: CNC lathe / injection molding machine / conveyor belt', temperature DOUBLE COMMENT 'Temperature (°C)', vibration DOUBLE COMMENT 'Vibration (mm/s)', pressure DOUBLE COMMENT 'Pressure (Bar)', rotation_speed DOUBLE COMMENT 'Rotation speed (RPM)', power_load DOUBLE COMMENT 'Power load (%)', record_time TIMESTAMP_NTZ NOT NULL, PRIMARY KEY (record_time, sensor_id) -- Partition column record_time must be first ) PARTITIONED BY (DAYS(record_time)); -- Enable change tracking to drive Dynamic Table to process only newly added data ALTER TABLE equipment_sensors SET TBLPROPERTIES ('change_tracking' = 'true'); -- BloomFilter index: accelerates equality queries on high-cardinality columns (verified syntax) CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_equipment_id ON TABLE equipment_sensors(equipment_id); CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_sensor_id ON TABLE equipment_sensors(sensor_id);

Alert output table equipment_alerts

-- Partition column alert_time must be the first field in the composite primary key CREATE TABLE equipment_alerts ( alert_id STRING NOT NULL, equipment_id STRING NOT NULL, equipment_type STRING, alert_time TIMESTAMP_NTZ NOT NULL, anomaly_metrics STRING COMMENT 'Anomaly metric description, e.g. "High temp:95°C, High vibration:9.2mm/s"', risk_level STRING COMMENT 'High / Medium', maintenance_advice STRING COMMENT 'AI-generated maintenance recommendation (≤50 characters)', predicted_failure_hours INT COMMENT 'AI-predicted hours until failure', PRIMARY KEY (alert_time, alert_id) -- Partition column alert_time must be first ) PARTITIONED BY (DAYS(alert_time)); -- BloomFilter index CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_alert_equipment_id ON TABLE equipment_alerts(equipment_id);

3.3 Three-Layer Processing Pipeline

Layer 1: 1-Hour Rolling Window Aggregation (equipment_metrics_1h)

Using averages instead of raw sample values as the basis for anomaly detection eliminates false alarms caused by sensor spikes and brief fluctuations. The Dynamic Table automatically refreshes incrementally every 10 minutes with no manual scheduling required.

CREATE DYNAMIC TABLE equipment_metrics_1h REFRESH INTERVAL 10 MINUTE AS SELECT equipment_id, equipment_type, MAX(record_time) AS last_record_time, ROUND(AVG(temperature), 2) AS avg_temp, ROUND(MAX(temperature), 2) AS max_temp, ROUND(AVG(vibration), 2) AS avg_vib, ROUND(MAX(vibration), 2) AS max_vib, ROUND(AVG(pressure), 2) AS avg_pres, ROUND(AVG(power_load), 2) AS avg_load, COUNT(*) AS reading_cnt FROM equipment_sensors WHERE record_time >= CURRENT_TIMESTAMP() - INTERVAL 1 HOUR GROUP BY equipment_id, equipment_type;

Layer 2: Multi-Dimensional Threshold Filtering and Risk Classification (equipment_anomaly_candidates)

Two-stage filtering: first a loose pre-filter (WHERE clause), then precise classification with strict thresholds (CASE WHEN), significantly reducing downstream AI call volume.

CREATE DYNAMIC TABLE equipment_anomaly_candidates REFRESH INTERVAL 10 MINUTE AS SELECT equipment_id, equipment_type, last_record_time, avg_temp, max_temp, avg_vib, max_vib, avg_pres, avg_load, -- Concatenate description strings for all metrics exceeding thresholds CONCAT_WS(',', CASE WHEN max_temp > 90 THEN CONCAT('High temp:', CAST(max_temp AS STRING), '°C') END, CASE WHEN max_vib > 8.0 THEN CONCAT('High vibration:', CAST(max_vib AS STRING), 'mm/s') END, CASE WHEN avg_pres > 12 THEN CONCAT('High pressure:', CAST(avg_pres AS STRING), 'Bar') END, CASE WHEN avg_load > 95 THEN CONCAT('High load:', CAST(avg_load AS STRING), '%') END ) AS anomaly_metrics, -- Risk classification: any metric exceeding high-risk threshold → High; exceeding medium-risk → Medium CASE WHEN max_temp > 100 OR max_vib > 12 OR avg_load > 98 THEN 'High' WHEN max_temp > 90 OR max_vib > 8 OR avg_pres > 12 THEN 'Medium' ELSE 'Low' END AS risk_level FROM equipment_metrics_1h WHERE max_temp > 85 OR max_vib > 6 OR avg_pres > 10 OR avg_load > 90; -- Loose pre-filter

Detection threshold reference:

MetricLoose Pre-filterMedium RiskHigh Risk
Temperature> 85°C> 90°C> 100°C
Vibration> 6 mm/s> 8 mm/s> 12 mm/s
Power Load> 90%> 95%> 98%
Pressure> 10 Bar> 12 Bar

Layer 3: AI-Generated Maintenance Recommendations Written to Alert Table

Only equipment with risk_level IN ('High', 'Medium') triggers AI_COMPLETE. Equipment type, anomaly metrics, and current averages are passed as context to the model. The advice and predicted_failure_hours fields are parsed from the JSON response and written directly to the alert table.

INSERT INTO equipment_alerts SELECT CONCAT(equipment_id, '_', DATE_FORMAT(CURRENT_TIMESTAMP(), 'yyyyMMddHHmm')) AS alert_id, equipment_id, equipment_type, CURRENT_TIMESTAMP() AS alert_time, anomaly_metrics, risk_level, -- AI call: extract maintenance recommendation text -- REGEXP_EXTRACT handles possible markdown code fence wrapping from LLM, extracting pure JSON object GET_JSON_OBJECT( REGEXP_EXTRACT( AI_COMPLETE( 'conn_dashscope:deepseek-v3', 'You are a factory equipment maintenance engineer. Based on the following sensor anomaly data, provide a concise maintenance recommendation. ' || 'Return JSON: {"advice":"specific maintenance action, no more than 50 characters","predicted_failure_hours":number}' || 'Equipment type: ' || COALESCE(equipment_type, '') || ', Anomaly metrics: ' || anomaly_metrics || ', Current avg temperature: ' || CAST(avg_temp AS STRING) || '°C' || ', Max vibration: ' || CAST(max_vib AS STRING) || 'mm/s' ), '(?s)\{.*\}', 0 ), '$.advice' ) AS maintenance_advice, -- AI call: extract predicted hours until failure CAST(GET_JSON_OBJECT( REGEXP_EXTRACT( AI_COMPLETE( 'conn_dashscope:deepseek-v3', 'You are a factory equipment maintenance engineer. Based on the following sensor anomaly data, provide a concise maintenance recommendation. ' || 'Return JSON: {"advice":"specific maintenance action, no more than 50 characters","predicted_failure_hours":number}' || 'Equipment type: ' || COALESCE(equipment_type, '') || ', Anomaly metrics: ' || anomaly_metrics || ', Current avg temperature: ' || CAST(avg_temp AS STRING) || '°C' || ', Max vibration: ' || CAST(max_vib AS STRING) || 'mm/s' ), '(?s)\{.*\}', 0 ), '$.predicted_failure_hours' ) AS INT) AS predicted_failure_hours FROM equipment_anomaly_candidates WHERE risk_level IN ('High', 'Medium');


4. Singdata Technical Advantages

4.1 Dynamic Table: Declarative Continuous Computation, Zero Operations

Traditional solutions require independently deploying Flink or Spark Streaming clusters and manually managing JobManagers, TaskManagers, and checkpoints. Singdata Dynamic Table declares computation logic with a single CREATE DYNAMIC TABLE SQL statement. The platform automatically detects upstream data changes (relying on Change Tracking) and performs incremental computation at the configured refresh interval — no external scheduler required.

Comparison DimensionTraditional Streaming (Flink)Singdata Dynamic Table
Development LanguageJava / Python + operator APIStandard SQL
Operational BurdenStandalone cluster + monitoring + restart policyFully managed by platform
Refresh LatencyMillisecond-level (overkill for this scenario)Minute-level (configurable: 1min–1h)
Dependency ManagementManual DAG maintenanceAutomatically tracks upstream table dependencies
Applicable ScenarioMillisecond-level real-time responseMinute-level monitoring and alerting (this scenario)

This anomaly detection solution does not require millisecond-level response — equipment typically has a window of several hours between developing anomalous averages and actual failure. A 10-minute refresh is fully sufficient, making Dynamic Table the most cost-effective choice.

4.2 AI_COMPLETE: Large Model Inference Embedded in SQL, No Extra Service

AI_COMPLETE is a built-in Singdata function. Large model inference is performed directly within SQL statements — no standalone AI service deployment, no API integration code, no model version management. API keys are centrally managed through Connection objects, supporting mainstream providers including DashScope (DeepSeek, Qwen), OpenAI, and others.

Two key design decisions ensure cost control:

  • Two-level filtering controls call volume: Loose pre-filter plus medium/high risk gate — token consumption is reduced by 90%+ compared to calling for all records
  • Structured output written directly to the table: The LLM response is stripped of any markdown code fence wrapping via REGEXP_EXTRACT('(?s)\{.*\}', 0), then GET_JSON_OBJECT extracts $.advice and $.predicted_failure_hours for direct insertion into the alert table, requiring no secondary parsing by downstream systems

4.3 Lakehouse: Unified Storage for Real-Time and Historical Data

Raw sensor data is stored in the Lakehouse partitioned by day, serving both as the upstream data source for Dynamic Tables (real-time computation) and as a data store supporting historical analysis queries across time ranges — without needing to sync data to a separate data warehouse or time-series database. This means:

  • The same equipment_sensors table simultaneously supports "past 1-hour anomaly detection" and "past 6-month failure trend analysis"
  • Can be joined directly with MES order tables and ERP spare parts inventory tables — no ETL pipeline required

4.4 Change Tracking: Incremental Processing, No Full-Table Scans

After enabling change_tracking = 'true' on equipment_sensors, Dynamic Table refresh only processes rows added since the last refresh — not a full table scan. As historical data accumulates, refresh performance remains stable and does not degrade with data growth.

4.5 BloomFilter Index: High-Cardinality Column Equality Query Acceleration

Background

equipment_sensors is a core wide table. Even after day-based partitioning, each partition may store millions of records. When operations personnel or downstream systems need to query specific equipment:

-- Scenario 1: Retrieve all raw sensor data for a specific equipment over the past 30 days (failure investigation) SELECT * FROM equipment_sensors WHERE equipment_id = 'EQ_CNC_01' AND record_time >= CURRENT_TIMESTAMP() - INTERVAL 30 DAY; -- Scenario 2: Bulk fetch latest data for multiple equipment items (Dashboard refresh) SELECT * FROM equipment_sensors WHERE equipment_id IN ('EQ_CNC_01', 'EQ_INJ_02', 'EQ_CONV_01') AND record_time >= CURRENT_TIMESTAMP() - INTERVAL 1 HOUR; -- Scenario 3: Look up historical alert details by alert ID SELECT * FROM equipment_alerts WHERE equipment_id = 'EQ_CNC_01' AND alert_time >= '2026-01-01';

equipment_id is a high-cardinality column (unique identifier per equipment; the larger the factory, the higher the cardinality). It is not part of the partition key or the sort prefix of the primary key. Without an additional index, the engine must scan all data blocks within the partition to find matching rows — cost grows linearly with data volume.

How BloomFilter Works

A BloomFilter index is a probability-based skip index. At build time, each data block (page/granule) maintains a BloomFilter bitmap recording which values appeared in that block. At query time:

Query: WHERE equipment_id = 'EQ_CNC_01' ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ Block 1 │ │ Block 2 │ │ Block 3 │ │ Block 4 │ │ BF: absent │ │ BF: maybe │ │ BF: absent │ │ BF: maybe │ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │ │ │ │ Skip Read and filter Skip Read and filter

  • Definitively absent: skipped entirely — zero I/O
  • Possibly present: block is read and precisely filtered (a very small false-positive probability, controlled by the fpp parameter)
  • In typical scenarios, 90%+ of data blocks can be skipped, drastically reducing scan volume

Creating the Index

-- Singdata actual syntax (verified): CREATE BLOOMFILTER INDEX -- Note: PROPERTIES('fpp'=...) parameter is not supported; platform default false-positive rate is used CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_equipment_id ON TABLE equipment_sensors(equipment_id); CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_sensor_id ON TABLE equipment_sensors(sensor_id); CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_alert_equipment_id ON TABLE equipment_alerts(equipment_id);

Use Cases and Boundaries

Query PatternSuitable for BloomFilterNotes
WHERE equipment_id = 'EQ_CNC_01'YesEquality query — optimal scenario
WHERE equipment_id IN (...)YesMulti-value equality — probed one by one
WHERE equipment_id LIKE 'EQ_%'NoFuzzy queries cannot utilize BloomFilter
WHERE temperature > 90NoRange queries use Min/Max statistics, not BloomFilter
WHERE risk_level = 'High'NoLow-cardinality column (only High/Medium/Low) — BloomFilter benefit is negligible

Synergy with Other Indexes

In this solution, three types of indexes play complementary roles:

Index TypeTarget ColumnsQuery Pattern Accelerated
Partition pruningrecord_timeWHERE record_time >= ... time range filtering
BloomFilterequipment_id, sensor_idHigh-cardinality column equality queries — skips irrelevant data blocks
Inverted IndexSTRING text columnsLIKE, full-text search (e.g., alert content search)

When all three are applied together, the engine first prunes partitions (dramatically narrowing the scan scope), then uses BloomFilter to skip irrelevant data blocks within the partition, and finally applies precise filtering only on the remaining rows.


5. Customer Value

5.1 Direct Economic Benefits

Value DimensionIndustry BenchmarkRealization Path
Reduce unplanned downtime35–50% reductionIdentify anomalies 4–48 hours in advance, preserving maintenance windows
Lower maintenance costs25–30% reductionReduce over-maintenance; replace parts on demand
Extend equipment lifespan20–40% improvementAvoid accelerated wear from running equipment while degraded
ROI payback period12–18 monthsTypical industry implementation timeframe

The loss from a single downtime event (average $260K/hour × average 4 hours = $1M+) far exceeds the build and operating costs of this entire solution — the ROI logic is clear.

5.2 Operational Efficiency Gains

  • Schedule optimization: The predicted_failure_hours field provides remaining useful time, allowing maintenance teams to arrange personnel and spare parts in advance, completing planned repairs during non-production windows and avoiding emergency line shutdowns
  • Reduced false alarms: The 1-hour rolling average baseline combined with two-level threshold filtering reduces false alarm rate by 60%+, letting maintenance teams focus on genuine threats rather than noise
  • Full equipment coverage: Automated monitoring replaces manual inspection — 7×24-hour coverage of hundreds of machines simultaneously; personnel shift from inspection to repair work

5.3 Decision Support Upgrade

  • Evolves from "fix when broken" to "know in advance which equipment will have what problem and how many hours away"
  • The anomaly_metrics field precisely lists metrics exceeding thresholds, so maintenance engineers can prepare the right tools and spare parts before arriving on-site
  • Historical alert data and raw sensor data are retained on the same platform, enabling equipment health profiles and identification of recurring failure root causes

5.4 Solution Build Cost Comparison

SolutionRequired ComponentsBuild TimeOperational Complexity
TraditionalIoT platform + time-series DB + Flink cluster + ML platform + AI inference service + BI tool3–6 monthsHigh (6+ systems)
This SolutionSingdata Lakehouse (one platform)DaysLow (pure SQL)

6. Quick Start

Prerequisites: Create an API Connection named conn_dashscope in Singdata Studio:

-- Using Singdata CREATE API CONNECTION syntax (validated) CREATE API CONNECTION IF NOT EXISTS conn_dashscope TYPE ai_function PROPERTIES ( 'BASE_URL' = 'https://dashscope.aliyuncs.com/compatible-mode/v1', 'API_KEY' = '<your-dashscope-api-key>' );

Execution order:

1. setup.sql -- Create equipment_sensors and equipment_alerts tables, enable Change Tracking 2. test_data.sql -- Import sample sensor data for 3 equipment items (includes normal, medium-risk, high-risk samples) 3. pipeline.sql -- Create two Dynamic Tables, trigger first refresh, trigger AI alert writes

Verify results:

-- View latest alerts, confirm AI recommendations and predicted failure times have been generated SELECT equipment_id, equipment_type, risk_level, anomaly_metrics, maintenance_advice, predicted_failure_hours FROM equipment_alerts ORDER BY alert_time DESC LIMIT 10;

Clean up environment (optional):

-- teardown.sql: drop all tables in dependency order DROP TABLE IF EXISTS equipment_anomaly_candidates; DROP TABLE IF EXISTS equipment_metrics_1h; DROP TABLE IF EXISTS equipment_alerts; DROP TABLE IF EXISTS equipment_sensors;


7. Extension Directions

Listed in order of implementation priority:

Near-term ready

DirectionDescriptionImplementation
Integrate equipment operation logsInject maintenance records and parts replacement history into AI context to prevent recently-serviced equipment from being incorrectly flagged as high riskAdd equipment_maintenance_log table, JOIN and include in AI prompt
Supply chain spare parts linkageAfter failure prediction, automatically query spare parts inventory and trigger replenishment orders when insufficientJoin equipment_alerts with spare parts inventory table, write to purchase request table

Medium-term planning

DirectionDescriptionImplementation
Cross-factory horizontal comparisonAggregate equipment health metrics across multiple production lines/factories to identify systemic issues (e.g., parts from the same batch failing early across the board)Add factory_id dimension to equipment_sensors, add cross-factory aggregation Dynamic Table
Adaptive thresholdsCalculate personalized normal ranges based on each equipment's historical data to replace current fixed thresholdsUse historical percentiles (P95/P99) to dynamically compute thresholds, write to threshold config table for Step 2 to query

Long-term evolution

DirectionDescription
Digital twin integrationPush alert coordinates (equipment ID + anomaly type) to factory 3D visualization system for spatial positioning
Multimodal fault diagnosisIncorporate equipment vibration spectrum and thermal imaging data, combined with vision large models to improve diagnostic accuracy

AI Functions

DocumentDescription
AI Functions OverviewOverall introduction to AI functions, model selection, calling methods, billing
AI_COMPLETEGeneral-purpose LLM completion function; used in this solution to generate structured maintenance recommendations and predict failure times
CREATE API CONNECTIONCreate API Connections to manage access credentials for large model services (DashScope/OpenAI, etc.)

Dynamic Table

DocumentDescription
Dynamic Table OverviewDynamic Table core concepts, incremental refresh mechanism, and comparison with materialized views and stream processing
Dynamic Table Development GuideEnd-to-end examples for creating tables, refreshing, and viewing history
CREATE DYNAMIC TABLETable creation syntax reference, including change_tracking, REFRESH INTERVAL, and other parameter descriptions
Dynamic Table Refresh ModesIncremental vs. full refresh mode explanation, and how to determine the current refresh strategy
Dynamic Table Refresh SchedulingScheduled refresh configuration to control anomaly detection pipeline refresh frequency
Monitoring Dynamic Tables with StudioVisually monitor Dynamic Table refresh status and dependency chain health through Studio

Indexes

DocumentDescription
Index OverviewUse cases and selection comparison for various index types (BloomFilter / Inverted / Vector)
BloomFilter IndexBloomFilter working principle and applicable boundaries; high-cardinality column equality query acceleration
CREATE BLOOMFILTER INDEXIndex creation syntax reference; used in this solution to accelerate equipment_id, sensor_id equality queries
Inverted IndexSuitable for fuzzy queries and full-text search on STRING columns; complementary to BloomFilter
Lakehouse Index Best Practices GuideCombined index usage strategies, including partition pruning + BloomFilter + inverted index stacking pattern

Partitioned Tables

DocumentDescription
Partitioning and BucketingPartitioned table design concepts, including PARTITIONED BY (DAYS(...)) time partitioning usage
Partitioned Table Usage GuidePartitioned table notes, including the rule that composite primary keys must include the partition key (the key constraint for this solution)

JSON and Regex Functions

DocumentDescription
get_json_objectExtract values at a specified path from a JSON string; used in this solution to parse $.advice and $.predicted_failure_hours from AI responses
regexp_extractRegex extraction function; used in this solution to strip potential markdown code fence wrapping from LLM responses and extract the pure JSON object