Healthcare FHIR Clinical Data Warehouse Best Practices
Ingest HL7 FHIR-formatted clinical data (patients, encounters, observations, medications) into a multi-layer data warehouse that supports clinical quality metric calculation and healthcare cost management. This guide uses a simulated FHIR JSON dataset of 5 patients, 5 encounters, 7 observations, and 6 medication requests to demonstrate the end-to-end build process: ODS (raw FHIR JSON) → DWD (parsing layer) → DWS (aggregation layer) → ADS (clinical quality metrics). It covers four key platform capabilities: nested JSON field extraction, Dynamic Table incremental computation, Column Masking for PHI de-identification, and Time Travel historical snapshots.
Overview
The core challenges of a healthcare FHIR clinical data warehouse are: FHIR messages are stored as nested JSON, each of the four resource types (Patient, Encounter, Observation, MedicationRequest) has a different structure requiring field extraction before joining, and PHI (Protected Health Information) fields such as patient names and birth dates must be masked for non-privileged users.
get_json_object + JSONPath syntax to extract any level of nesting on demand
Four FHIR resource types need to be joined for analysis
Dynamic Table with declarative JOIN SQL; system automatically refreshes incrementally
ODS → DWD → DWS → ADS data pipeline
Dynamic Table chained dependencies; downstream refreshes automatically when upstream is updated
PHI fields such as patient name and birth date must be masked
Column Masking bound to columns, transparently applied to non-privileged users
Insurance reconciliation requires historical monthly data snapshots
Time Travel with TIMESTAMP AS OF syntax to query data at any point in time
Clinical pathway compliance rates and other quality metrics updated daily
ADS layer Dynamic Table + Lakehouse Studio Task with daily scheduled refresh
SQL Commands Used
Command / Function
Purpose
Notes
CREATE TABLE
Create ODS layer FHIR raw tables
One table per resource type, storing complete JSON
get_json_object
Extract fields from FHIR JSON using JSONPath
Supports $ paths; arrays use [0] index
CREATE DYNAMIC TABLE
Build DWD / DWS / ADS layers
Declarative SQL; system handles incremental computation automatically
REFRESH DYNAMIC TABLE
Manually trigger one refresh
Used during initial build or debugging
ALTER TABLE ... CHANGE COLUMN ... SET MASK
Bind masking function to PHI column
Transparently applied to all queries, including Dynamic Tables
TIMESTAMP AS OF
Time Travel historical snapshot query
Monthly lookback for insurance reconciliation
DATEDIFF
Calculate length of stay (LOS) and patient age
Time difference calculation
FLOOR / CAST
Numeric type conversion
Age rounding, JSON string to DOUBLE
Prerequisites
All examples in this guide run under the best_practice_fhir_clinical schema.
CREATE SCHEMA IF NOT EXISTS best_practice_fhir_clinical;
ODS Layer: Raw FHIR JSON Tables
The ODS layer stores complete FHIR resource JSON strings directly, preserving original messages without parsing, making it easy to extend fields and replay later.
Create Tables
-- FHIR Patient resource
CREATE TABLE IF NOT EXISTS best_practice_fhir_clinical.doc_fhir_patient (
patient_id STRING,
resource_json STRING
);
-- FHIR Encounter resource (visit events)
CREATE TABLE IF NOT EXISTS best_practice_fhir_clinical.doc_fhir_encounter (
encounter_id STRING,
patient_id STRING,
resource_json STRING
);
-- FHIR Observation resource (lab results / vital signs)
CREATE TABLE IF NOT EXISTS best_practice_fhir_clinical.doc_fhir_observation (
obs_id STRING,
patient_id STRING,
resource_json STRING
);
-- FHIR MedicationRequest resource (medication orders)
CREATE TABLE IF NOT EXISTS best_practice_fhir_clinical.doc_fhir_medication_request (
req_id STRING,
patient_id STRING,
resource_json STRING
);
Load Sample Data
In production, use COPY INTO + Volume to bulk-import FHIR JSON files.
Load data from a local CSV file (recommended):
-- Step 1: Upload the local CSV file to User Volume via SQL PUT
PUT '/path/to/your/data.csv' TO USER VOLUME FILE 'data.csv';
-- Step 2: COPY INTO the table from User Volume
COPY INTO best_practice_fhir_clinical.doc_fhir_patient
FROM USER VOLUME
USING csv
OPTIONS('header'='true', 'sep'=',', 'nullValue'='')
FILES ('data.csv');
You can also insert small batches of test data inline (no CSV file needed):
After inserting 7 observation records (Observation resource, including LOINC codes) and 6 medication request records (MedicationRequest, including RxNorm codes), verify the ODS layer row counts:
SELECT COUNT(*) AS patient_count FROM best_practice_fhir_clinical.doc_fhir_patient;
SELECT COUNT(*) AS encounter_count FROM best_practice_fhir_clinical.doc_fhir_encounter;
SELECT COUNT(*) AS obs_count FROM best_practice_fhir_clinical.doc_fhir_observation;
SELECT COUNT(*) AS med_count FROM best_practice_fhir_clinical.doc_fhir_medication_request;
DWD Layer: FHIR JSON Parsing and Structuring
The DWD layer uses Dynamic Tables to extract nested JSON fields from the ODS layer into relational columns, serving as the foundation for downstream aggregation and quality control analysis.
JSON Field Extraction Patterns
FHIR JSON structures are deeply nested. Follow these JSONPath patterns when extracting:
FHIR Field Type
JSONPath Example
Notes
Top-level scalar
$.gender
Direct access
Property of first array element
$.name[0].family
Array index [0]
Multi-level nested array
$.reasonCode[0].coding[0].code
Multi-level nesting
Scalar in nested object
$.period.start
Dot path
Numeric value (requires type cast)
CAST(get_json_object(...) AS DOUBLE)
Returns string by default
Validate Patient resource extraction:
SELECT
patient_id,
get_json_object(resource_json, '$.name[0].family') AS family_name,
get_json_object(resource_json, '$.name[0].given[0]') AS given_name,
get_json_object(resource_json, '$.gender') AS gender,
get_json_object(resource_json, '$.birthDate') AS birth_date,
get_json_object(resource_json, '$.address[0].city') AS city
FROM best_practice_fhir_clinical.doc_fhir_patient;
patient_id | family_name | given_name | gender | birth_date | city
-----------+-------------+------------+--------+------------+-----------
P001 | Zhang | Wei | male | 1980-05-15 | Shanghai
P002 | Li | Fang | female | 1972-11-23 | Beijing
P003 | Wang | Jun | male | 1955-03-08 | Guangzhou
P004 | Chen | Mei | female | 1990-07-30 | Shenzhen
P005 | Liu | Yang | male | 1968-01-12 | Chengdu
Validate ICD-10 code and encounter duration extraction from Encounter resource:
SELECT
encounter_id,
patient_id,
get_json_object(resource_json, '$.status') AS enc_status,
get_json_object(resource_json, '$.class.code') AS enc_class,
get_json_object(resource_json, '$.period.start') AS period_start,
get_json_object(resource_json, '$.period.end') AS period_end,
get_json_object(resource_json, '$.reasonCode[0].coding[0].code') AS icd_code,
get_json_object(resource_json, '$.reasonCode[0].coding[0].display') AS diagnosis,
get_json_object(resource_json, '$.serviceProvider.display') AS department
FROM best_practice_fhir_clinical.doc_fhir_encounter;
💡 Tip: E004's period.end is NULL, corresponding to the in-progress inpatient. When computing LOS (length of stay) downstream, use COALESCE(period.end, CURRENT_TIMESTAMP()) to handle current inpatients and avoid NULL causing abnormal aggregation results.
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_fhir_clinical.dwd_patient_dim
AS
SELECT
p.patient_id,
get_json_object(p.resource_json, '$.name[0].family') AS family_name,
get_json_object(p.resource_json, '$.name[0].given[0]') AS given_name,
get_json_object(p.resource_json, '$.gender') AS gender,
get_json_object(p.resource_json, '$.birthDate') AS birth_date,
get_json_object(p.resource_json, '$.address[0].city') AS city,
DATEDIFF(CURRENT_DATE(),
CAST(get_json_object(p.resource_json, '$.birthDate') AS DATE)) / 365 AS age_years
FROM best_practice_fhir_clinical.doc_fhir_patient p;
dwd_encounter_fact: Encounter fact table, extracts ICD-10 codes, length of stay, and department, and classifies by ICD chapter:
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_fhir_clinical.dwd_encounter_fact
AS
SELECT
e.encounter_id,
e.patient_id,
get_json_object(e.resource_json, '$.status') AS enc_status,
get_json_object(e.resource_json, '$.class.code') AS enc_class,
CAST(get_json_object(e.resource_json, '$.period.start') AS TIMESTAMP) AS admit_time,
CAST(get_json_object(e.resource_json, '$.period.end') AS TIMESTAMP) AS discharge_time,
DATEDIFF(
CAST(get_json_object(e.resource_json, '$.period.end') AS TIMESTAMP),
CAST(get_json_object(e.resource_json, '$.period.start') AS TIMESTAMP)
) AS los_days,
get_json_object(e.resource_json, '$.reasonCode[0].coding[0].code') AS icd_code,
get_json_object(e.resource_json, '$.reasonCode[0].coding[0].display') AS primary_diagnosis,
get_json_object(e.resource_json, '$.serviceProvider.display') AS department,
CASE
WHEN UPPER(get_json_object(e.resource_json, '$.reasonCode[0].coding[0].code')) LIKE 'I%' THEN 'Cardiology'
WHEN UPPER(get_json_object(e.resource_json, '$.reasonCode[0].coding[0].code')) LIKE 'E%' THEN 'Endocrinology'
WHEN UPPER(get_json_object(e.resource_json, '$.reasonCode[0].coding[0].code')) LIKE 'J%' THEN 'Pulmonology'
WHEN UPPER(get_json_object(e.resource_json, '$.reasonCode[0].coding[0].code')) LIKE 'N%' THEN 'Nephrology'
ELSE 'Other'
END AS icd_chapter
FROM best_practice_fhir_clinical.doc_fhir_encounter e;
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_fhir_clinical.dwd_observation_fact
AS
SELECT
o.obs_id,
o.patient_id,
get_json_object(o.resource_json, '$.category[0].coding[0].code') AS obs_category,
get_json_object(o.resource_json, '$.code.coding[0].code') AS loinc_code,
get_json_object(o.resource_json, '$.code.coding[0].display') AS obs_name,
CAST(get_json_object(o.resource_json, '$.valueQuantity.value') AS DOUBLE) AS obs_value,
get_json_object(o.resource_json, '$.valueQuantity.unit') AS obs_unit,
CAST(get_json_object(o.resource_json, '$.effectiveDateTime') AS TIMESTAMP) AS obs_time
FROM best_practice_fhir_clinical.doc_fhir_observation o;
dwd_medication_fact: Medication order fact table, extracts RxNorm codes, drug names, dosages, and routes:
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_fhir_clinical.dwd_medication_fact
AS
SELECT
m.req_id,
m.patient_id,
get_json_object(m.resource_json, '$.status') AS req_status,
get_json_object(m.resource_json, '$.medicationCodeableConcept.coding[0].code') AS rxnorm_code,
get_json_object(m.resource_json, '$.medicationCodeableConcept.coding[0].display') AS medication_name,
CAST(get_json_object(m.resource_json, '$.dosageInstruction[0].doseAndRate[0].doseQuantity.value') AS DOUBLE) AS dose_value,
get_json_object(m.resource_json, '$.dosageInstruction[0].doseAndRate[0].doseQuantity.unit') AS dose_unit,
get_json_object(m.resource_json, '$.dosageInstruction[0].route.coding[0].code') AS route,
CAST(get_json_object(m.resource_json, '$.authoredOn') AS TIMESTAMP) AS authored_on,
REPLACE(get_json_object(m.resource_json, '$.encounter.reference'), 'Encounter/', '') AS encounter_id
FROM best_practice_fhir_clinical.doc_fhir_medication_request m;
⚠️ Note: Do not write REFRESH INTERVAL in the Dynamic Table DDL. Periodic refresh is managed via Lakehouse Studio Tasks (see the "Scheduling Configuration" section), which allows attaching alerts and data quality rules to the same task.
Manually trigger the first refresh and verify data:
REFRESH DYNAMIC TABLE best_practice_fhir_clinical.dwd_patient_dim;
REFRESH DYNAMIC TABLE best_practice_fhir_clinical.dwd_encounter_fact;
REFRESH DYNAMIC TABLE best_practice_fhir_clinical.dwd_observation_fact;
REFRESH DYNAMIC TABLE best_practice_fhir_clinical.dwd_medication_fact;
SELECT patient_id, family_name, gender, birth_date, city, ROUND(age_years, 1) AS age
FROM best_practice_fhir_clinical.dwd_patient_dim
ORDER BY patient_id;
patient_id | family_name | gender | birth_date | city | age
-----------+-------------+--------+------------+-----------+-----
P001 | Zhang | male | 1980-05-15 | Shanghai | 46.1
P002 | Li | female | 1972-11-23 | Beijing | 53.6
P003 | Wang | male | 1955-03-08 | Guangzhou | 71.3
P004 | Chen | female | 1990-07-30 | Shenzhen | 35.9
P005 | Liu | male | 1968-01-12 | Chengdu | 58.4
Validate the encounter fact table (focus on los_days calculation and icd_chapter classification):
SELECT encounter_id, patient_id, enc_class, icd_code, primary_diagnosis,
department, los_days, icd_chapter
FROM best_practice_fhir_clinical.dwd_encounter_fact
ORDER BY encounter_id;
💡 Tip: icd_chapter classifies by the first letter of ICD-10 codes. E005 (cerebral infarction I63.9) falls under the Cardiology chapter (I series) per ICD-10 standard classification. If you need to track neurological conditions separately, add special handling for I6% in the CASE expression.
Observation fact table (including LOINC codes and quantitative values):
SELECT obs_id, patient_id, obs_category, loinc_code, obs_name, obs_value, obs_unit
FROM best_practice_fhir_clinical.dwd_observation_fact
ORDER BY obs_id;
FHIR Patient data contains PHI fields such as patient names (family_name) that must be masked for non-privileged users. The following example binds a masking policy to the family_name column of dwd_patient_dim. Admin accounts see the original value; other users see the masked value.
-- Create masking function: admin sees original name, other users see masked version
CREATE OR REPLACE FUNCTION best_practice_fhir_clinical.mask_phi_name(name STRING)
RETURNS STRING
AS CASE
WHEN current_user() IN ('privileged_user') THEN name -- replace with actual authorized username
ELSE CONCAT(LEFT(name, 1), REPEAT('*', LENGTH(name) - 1))
END;
💡 Tip: Replace 'privileged_user' with the actual username(s) that need access to plaintext data. Column Masking uses the current_user() function to match the current connection's username. All authorized usernames must be explicitly listed in the IN() list.
⚠️ Note: Column Masking is transparently applied to all queries, including upstream JOINs in Dynamic Tables.
DWS Layer: Department Cost Aggregation and Patient Observation Summary
Department Cost Aggregation
dws_department_cost aggregates encounter counts, average length of stay, and prescription counts by department and ICD chapter, serving as the foundation for DRG cost management:
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_fhir_clinical.dws_department_cost
AS
SELECT
e.department,
e.icd_chapter,
COUNT(DISTINCT e.encounter_id) AS total_encounters,
COUNT(DISTINCT e.patient_id) AS total_patients,
SUM(CASE WHEN e.enc_class = 'IMP' THEN 1 ELSE 0 END) AS inpatient_count,
SUM(CASE WHEN e.enc_class = 'AMB' THEN 1 ELSE 0 END) AS outpatient_count,
SUM(CASE WHEN e.enc_class = 'EMER' THEN 1 ELSE 0 END) AS emergency_count,
ROUND(AVG(CAST(e.los_days AS DOUBLE)), 2) AS avg_los_days,
MAX(e.los_days) AS max_los_days,
COUNT(DISTINCT m.req_id) AS total_prescriptions,
COUNT(DISTINCT m.rxnorm_code) AS distinct_medications
FROM best_practice_fhir_clinical.dwd_encounter_fact e
LEFT JOIN best_practice_fhir_clinical.dwd_medication_fact m ON e.encounter_id = m.encounter_id
GROUP BY e.department, e.icd_chapter;
REFRESH DYNAMIC TABLE best_practice_fhir_clinical.dws_department_cost;
SELECT department, icd_chapter, total_encounters, avg_los_days,
total_prescriptions, distinct_medications
FROM best_practice_fhir_clinical.dws_department_cost
ORDER BY total_encounters DESC;
Result interpretation: Pulmonology (J44.1 COPD acute exacerbation) has the highest average length of stay at 9 days, consistent with the clinical pattern of prolonged hospitalization during COPD exacerbations. The heart failure patient in Cardiology (I50.0) was hospitalized for 5 days with 2 medications prescribed (Furosemide + Carvedilol), aligning with the standard clinical pathway.
Patient Observation Summary
dws_patient_obs_summary aggregates observation results by patient and LOINC code to facilitate tracking of indicator trends:
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_fhir_clinical.dws_patient_obs_summary
AS
SELECT
o.patient_id,
o.loinc_code,
o.obs_name,
o.obs_unit,
COUNT(*) AS obs_count,
ROUND(MIN(o.obs_value), 2) AS min_value,
ROUND(MAX(o.obs_value), 2) AS max_value,
ROUND(AVG(o.obs_value), 2) AS avg_value,
MIN(o.obs_time) AS first_obs_time,
MAX(o.obs_time) AS last_obs_time
FROM best_practice_fhir_clinical.dwd_observation_fact o
GROUP BY o.patient_id, o.loinc_code, o.obs_name, o.obs_unit;
REFRESH DYNAMIC TABLE best_practice_fhir_clinical.dws_patient_obs_summary;
SELECT patient_id, loinc_code, obs_name, obs_unit, obs_count, min_value, max_value, avg_value
FROM best_practice_fhir_clinical.dws_patient_obs_summary
ORDER BY patient_id, loinc_code;
Result interpretation: P002 (diabetic patient) has an HbA1c of 8.5%, exceeding the clinical control target (< 7%), and blood glucose of 12.4 mmol/L is also significantly elevated, indicating poor glycemic control requiring intensified treatment. P004 (CKD stage 3) has creatinine at 2.8 mg/dL, in the moderate-to-severe elevation range (normal upper limit ~1.2), indicating significant renal impairment. P003 (COPD) has SpO2 at 88%, below the normal lower limit (95%), consistent with hypoxemia during acute exacerbation.
ADS Layer: Clinical Quality Metrics
ads_clinical_quality_metrics integrates patient dimensions, encounter facts, and observation facts to compute three types of clinical pathway compliance indicators:
hba1c_tested: Whether diabetic patients (ICD E series) completed HbA1c testing
creatinine_tested: Whether nephrology patients (ICD N series) completed creatinine testing
spo2_monitored: Whether COPD patients (ICD J series) had SpO2 monitored
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_fhir_clinical.ads_clinical_quality_metrics
AS
SELECT
p.patient_id,
p.family_name,
p.gender,
FLOOR(p.age_years) AS age,
e.encounter_id,
e.department,
e.icd_chapter,
e.icd_code,
e.primary_diagnosis,
e.enc_class,
e.los_days,
e.enc_status,
COUNT(DISTINCT m.req_id) AS prescription_count,
COUNT(DISTINCT o.obs_id) AS lab_count,
MAX(CASE WHEN o.loinc_code = '4548-4' THEN 1 ELSE 0 END) AS hba1c_tested,
MAX(CASE WHEN o.loinc_code = '2160-0' THEN 1 ELSE 0 END) AS creatinine_tested,
MAX(CASE WHEN o.loinc_code = '59408-5' THEN 1 ELSE 0 END) AS spo2_monitored,
MAX(CASE WHEN o.loinc_code = '4548-4' THEN o.obs_value END) AS hba1c_value,
MAX(CASE WHEN o.loinc_code = '2160-0' THEN o.obs_value END) AS creatinine_value,
MAX(CASE WHEN o.loinc_code = '59408-5' THEN o.obs_value END) AS spo2_value
FROM best_practice_fhir_clinical.dwd_patient_dim p
JOIN best_practice_fhir_clinical.dwd_encounter_fact e ON p.patient_id = e.patient_id
LEFT JOIN best_practice_fhir_clinical.dwd_medication_fact m ON e.encounter_id = m.encounter_id
LEFT JOIN best_practice_fhir_clinical.dwd_observation_fact o ON p.patient_id = o.patient_id
GROUP BY
p.patient_id, p.family_name, p.gender, FLOOR(p.age_years),
e.encounter_id, e.department, e.icd_chapter, e.icd_code,
e.primary_diagnosis, e.enc_class, e.los_days, e.enc_status;
REFRESH DYNAMIC TABLE best_practice_fhir_clinical.ads_clinical_quality_metrics;
SELECT patient_id, family_name, age, department, primary_diagnosis,
enc_class, los_days, prescription_count, lab_count,
hba1c_tested, creatinine_tested, spo2_monitored,
hba1c_value, creatinine_value, spo2_value
FROM best_practice_fhir_clinical.ads_clinical_quality_metrics
ORDER BY patient_id;
SELECT
e.icd_chapter AS disease_group,
COUNT(DISTINCT a.encounter_id) AS encounter_count,
ROUND(AVG(CAST(a.los_days AS DOUBLE)), 1) AS avg_los,
SUM(a.prescription_count) AS total_prescriptions,
ROUND(SUM(a.hba1c_tested) * 100.0
/ NULLIF(SUM(CASE WHEN a.icd_code LIKE 'E%' THEN 1 ELSE 0 END), 0), 1) AS diabetes_hba1c_rate_pct,
ROUND(SUM(a.creatinine_tested) * 100.0
/ NULLIF(SUM(CASE WHEN a.icd_code LIKE 'N%' THEN 1 ELSE 0 END), 0), 1) AS ckd_creatinine_rate_pct,
ROUND(SUM(a.spo2_monitored) * 100.0
/ NULLIF(SUM(CASE WHEN a.icd_code LIKE 'J%' THEN 1 ELSE 0 END), 0), 1) AS copd_spo2_rate_pct
FROM best_practice_fhir_clinical.ads_clinical_quality_metrics a
JOIN best_practice_fhir_clinical.dwd_encounter_fact e ON a.encounter_id = e.encounter_id
GROUP BY e.icd_chapter
ORDER BY encounter_count DESC;
Result interpretation: The key lab compliance rates for diabetes (Endocrinology), CKD (Nephrology), and COPD (Pulmonology) are all 100%, meaning every patient completed their required tests. The 2 encounters in the Cardiology chapter (heart failure + cerebral infarction) have no specific quality metrics assigned, but the heart failure patient (P001) did have creatinine tested (to assess renal function after diuretic use).
Time Travel: Insurance Reconciliation Historical Snapshots
In insurance monthly reconciliation scenarios, it is necessary to retrieve encounter data snapshots at a historical point in time. Time Travel achieves this with the TIMESTAMP AS OF syntax:
-- Query the encounter data snapshot at 2026-06-06 23:38 (for insurance reconciliation)
SELECT COUNT(*) AS encounter_count
FROM best_practice_fhir_clinical.doc_fhir_encounter
TIMESTAMP AS OF '2026-06-06 23:38:00';
encounter_count
---------------
5
-- Compare with current data (verify the delta)
SELECT COUNT(*) AS current_count FROM best_practice_fhir_clinical.doc_fhir_encounter;
⚠️ Note: TIMESTAMP AS OF requires a literal constant and does not support expressions like NOW() - INTERVAL '1' MONTH. Timestamps use UTC+8 timezone; DESC HISTORY returns UTC times, so be mindful of timezone differences when querying.
The Time Travel data retention period defaults to 7 days; historical versions beyond the retention period cannot be queried. For monthly insurance reconciliation scenarios, it is recommended to periodically INSERT INTO end-of-month snapshots into a dedicated archive table, rather than relying on Time Travel's retention limits.
Scheduling Configuration: Lakehouse Studio Task
💡 Tip: The examples below use cz-cli (the Singdata Lakehouse command-line tool). If cz-cli is not installed, refer to the cz-cli setup guide. Alternatively, you can execute SQL in Lakehouse Studio → Development → SQL Editor and configure scheduled tasks on the Studio → Tasks page.
Periodic refresh of Dynamic Tables is managed via Lakehouse Studio Tasks rather than writing REFRESH INTERVAL in the DDL. Create refresh tasks under the Studio path best_practices/fhir_clinical/ to attach monitoring alerts and data quality rules to the same task.
get_json_object returns strings: All extracted results are STRING type by default. Numeric fields (observation values, dosages) must be explicitly CAST(... AS DOUBLE), otherwise aggregations (AVG/MIN/MAX) may produce unexpected results.
FHIR array fields: Many FHIR standard fields are arrays (such as name[], reasonCode[]). This guide uses [0] to take the first element. If you need to expand all array elements, use LATERAL VIEW EXPLODE(SPLIT(json_array_str, ...)).
Dynamic Table incremental refresh dependencies: DWS/ADS layer Dynamic Tables depend on the DWD layer, which depends on the ODS layer. Refresh order must follow ODS → DWD → DWS → ADS. Arrange REFRESH statements in order in Studio Tasks, or configure task dependency chains.
DATEDIFF handling of NULL: For current inpatients (enc_status = 'in-progress'), discharge_time is NULL, and DATEDIFF(NULL, admit_time) returns NULL. avg_los_days in the DWS layer uses AVG(CAST(los_days AS DOUBLE)), and AVG automatically ignores NULL, so the result is the average for discharged patients, which is the correct business semantics.
Time Travel retention: Default retention is 7 days. For scenarios requiring long-term historical version retention such as insurance reconciliation, periodically INSERT INTO end-of-month data snapshots into a dedicated archive table. Do not rely entirely on Time Travel.
Column Masking: PHI masking is transparently applied to Dynamic Tables — when the DWD layer reads the ODS layer, if a column has a masking function bound to it, the value stored in DWD is also the masked value.
⚠️ Pending manual verification: Column Masking currently uses current_user() to match usernames for authorization. All usernames with access to plaintext data must be individually added to the masking function's whitelist. If your Lakehouse version supports role-based dynamic evaluation (e.g., HAS_ROLE('role_name')), you can use roles instead of username lists for more flexible maintenance. Please contact Singdata technical support to confirm whether this function is supported in your current version.