Integrate data from Hospital Information Systems (HIS), Electronic Medical Records (EMR), and laboratory systems to build a healthcare operations data warehouse that outputs department performance metrics and hospital-level KPI dashboards. Using Q1 2025 California hospital admission data as the dataset, this document provides an end-to-end demonstration of the complete build process from MySQL offline sync → ODS → DWD → DWS → ADS, and covers the practical implementation of three key capabilities: Column Masking, RBAC tiered permissions, and Time Travel compliance auditing.
Overview
Core challenges for healthcare data warehouses: patient privacy protection, multi-source system integration, automated computation of operational metrics, and historical data traceability for medical insurance reconciliation.
Challenge
Singdata Solution
HIS/EMR data synced daily in full or incrementally
MySQL multi-table offline sync (full database mirror mode), automatic scheduling
Importing HL7/JSON format lab report files
COPY INTO + Volume, supporting JSON format batch loading
ODS→DWD→DWS→ADS automatic incremental computation
Dynamic Table, declarative SQL, system automatically maintains refresh dependency chain
Masking of PII fields such as patient ID and diagnostic information
Column Masking, bound to columns, dynamically controls display by role
Tiered permissions for clinicians, management, and BI analysts
Declarative SQL, system automatically refreshes incrementally
REFRESH DYNAMIC TABLE
Manually trigger a single refresh
Used during initial build or debugging
CREATE FUNCTION
Create Column Masking policy function
SQL UDF, dynamically masks patient PII fields by role
ALTER TABLE ... CHANGE COLUMN ... SET MASK
Bind the masking function to a column
Transparent to the bound column; dynamically executed at read time
CREATE ROLE
Create custom RBAC roles
Differentiate permissions for clinical, management, and BI user types
GRANT
Grant roles data access permissions
Schema-level or table-level authorization
DESC HISTORY
View table historical version list
Returns timestamp, operation type, and row change count for each version
SELECT ... TIMESTAMP AS OF
Query historical data at a specified point in time
Medical insurance reconciliation, auditing, and data recovery scenarios
Prerequisites
All examples in this document run under the best_practice_healthcare_dw schema.
CREATE SCHEMA IF NOT EXISTS best_practice_healthcare_dw;
ODS Layer: Raw Business Data Ingestion
The ODS layer stores raw data synchronized from HIS, EMR, and laboratory systems without business transformation, while configuring Column Masking on patient privacy fields.
Create Tables
Hospital dimension table (static reference data):
CREATE TABLE IF NOT EXISTS best_practice_healthcare_dw.doc_hospital_dim (
hospital_id STRING,
hospital_name STRING,
county STRING,
city STRING,
hospital_type STRING,
bed_count INT,
is_teaching BOOLEAN
);
Admission record main table (HIS offline sync target table):
HIS systems typically run on MySQL and contain multiple business tables including admissions, billing, and procedures. In Lakehouse Studio, use a multi-table offline sync task to implement daily full + incremental sync:
Sync mode: Full database mirror (full database sync), automatically mapping all business tables in the HIS database to the best_practice_healthcare_dw schema
Scheduling policy: run incremental sync at 02:00 daily; run full sync on first run
Studio task path: best_practices/healthcare_dw/
💡 Tip: Multi-table offline sync tasks support field mapping and type conversion. You can map DATETIME fields in HIS to the TIMESTAMP type in the Lakehouse in the task configuration.
COPY INTO to Import HL7/JSON Format Lab Reports
Laboratory systems (LIS) typically output lab report files in JSON or HL7 format stored in object storage (OSS/S3), loaded periodically via COPY INTO + Volume. First create the object storage access credentials and External Volume, then execute COPY INTO:
-- Step 1: Create a Storage Connection pointing to object storage
-- Alibaba Cloud OSS example; for AWS S3 / Tencent Cloud COS, replace TYPE, ENDPOINT, and credential fields accordingly.
CREATE STORAGE CONNECTION IF NOT EXISTS lab_reports_oss_conn
TYPE oss
ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com'
ACCESS_ID = '<your_access_key_id>'
ACCESS_KEY = '<your_access_key_secret>';
-- Step 2: Mount the directory containing LIS lab reports as an External Volume
CREATE EXTERNAL VOLUME IF NOT EXISTS best_practice_healthcare_dw.lab_reports_volume
LOCATION 'oss://<your-bucket>/lis/lab-reports/'
USING CONNECTION lab_reports_oss_conn
DIRECTORY = (enable = true, auto_refresh = true)
RECURSIVE = true;
After mounting, batch import JSON lab reports from lab_reports_volume:
COPY INTO best_practice_healthcare_dw.doc_ods_lab_results
FROM VOLUME best_practice_healthcare_dw.lab_reports_volume
USING json
ON_ERROR = 'CONTINUE';
⚠️ Note: ON_ERROR = 'CONTINUE' skips files with invalid formats and continues loading. It is recommended to query COPY_HISTORY after loading to confirm whether any files were skipped.
ODS Layer: Column Masking for Patient PII
Patient IDs and diagnostic information in medical data are Protected Health Information (PHI) that needs to be dynamically masked based on the accessor's role. Singdata Lakehouse binds masking policies at the column level through Column Masking, transparently shielding sensitive fields from non-privileged users.
Create Masking Policy Functions
Patient ID masking: non-workspace_admin roles only see the first and last characters, with the middle replaced by *.
CREATE OR REPLACE FUNCTION best_practice_healthcare_dw.mask_patient_id(pid STRING)
RETURNS STRING
AS
CASE
WHEN array_contains(current_roles(), 'workspace_admin') THEN pid
ELSE CONCAT(SUBSTR(pid, 1, 1), REPEAT('*', LENGTH(pid) - 2), SUBSTR(pid, LENGTH(pid), 1))
END;
Diagnosis masking: only workspace_admin and workspace_dev roles can see the full diagnosis; other roles see RESTRICTED.
CREATE OR REPLACE FUNCTION best_practice_healthcare_dw.mask_diagnosis(diag STRING)
RETURNS STRING
AS
CASE
WHEN array_contains(current_roles(), 'workspace_admin')
OR array_contains(current_roles(), 'workspace_dev') THEN diag
ELSE 'RESTRICTED'
END;
Bind Masking Policies to Columns
-- Bind patient ID masking
ALTER TABLE best_practice_healthcare_dw.doc_ods_admissions
CHANGE COLUMN patient_id
SET MASK best_practice_healthcare_dw.mask_patient_id;
-- Bind diagnosis masking
ALTER TABLE best_practice_healthcare_dw.doc_ods_admissions
CHANGE COLUMN principal_diagnosis_desc
SET MASK best_practice_healthcare_dw.mask_diagnosis;
Query result for workspace_admin user after binding (full data):
⚠️ Note: Column Masking is a preview feature. To enable it in a production environment, contact technical support to activate it. After binding the masking function, it takes effect on all query paths, including cases where data is read indirectly through views and Dynamic Tables.
DWD Layer: Patient Visit Event Wide Table
The DWD layer uses a Dynamic Table to join ODS layer admission records with the hospital dimension table, while deriving fields such as age group and discharge category, forming an analysis-friendly patient visit event wide table.
Create Tables
CREATE DYNAMIC TABLE best_practice_healthcare_dw.doc_dwd_patient_visits
REFRESH INTERVAL 60 MINUTE VCLUSTER DEFAULT
AS
SELECT
a.admission_id,
a.hospital_id,
h.hospital_name,
h.county AS hospital_county,
h.hospital_type,
h.is_teaching,
a.patient_id,
a.age,
CASE
WHEN a.age < 18 THEN 'Pediatric'
WHEN a.age < 45 THEN 'Adult'
WHEN a.age < 65 THEN 'Middle-Aged'
ELSE 'Senior'
END AS age_group,
a.sex,
a.race,
a.county_of_residence,
a.admission_date,
a.discharge_date,
a.los_days,
a.admission_type,
a.admission_source,
a.discharge_disposition,
a.principal_diagnosis_code,
a.principal_diagnosis_desc,
a.major_diagnostic_category,
a.department,
a.payer_type,
a.total_charges,
a.drg_code,
a.drg_description,
CASE
WHEN a.discharge_disposition IN ('Home', 'Home Health') THEN 'Routine'
WHEN a.discharge_disposition = 'SNF' THEN 'Extended Care'
ELSE 'Other'
END AS discharge_category,
CASE WHEN a.los_days > 7 THEN 1 ELSE 0 END AS is_long_stay,
a.load_time
FROM best_practice_healthcare_dw.doc_ods_admissions a
LEFT JOIN best_practice_healthcare_dw.doc_hospital_dim h
ON a.hospital_id = h.hospital_id;
Lab report event wide table, calculating lab turnaround time (TAT):
CREATE DYNAMIC TABLE best_practice_healthcare_dw.doc_dwd_lab_events
REFRESH INTERVAL 60 MINUTE VCLUSTER DEFAULT
AS
SELECT
l.lab_id,
l.admission_id,
l.patient_id,
l.hospital_id,
h.hospital_name,
l.test_name,
l.test_code,
l.result_value,
l.reference_range,
l.abnormal_flag,
l.collection_time,
l.result_time,
TIMESTAMPDIFF(MINUTE, l.collection_time, l.result_time) AS tat_minutes,
l.ordering_physician,
l.department
FROM best_practice_healthcare_dw.doc_ods_lab_results l
LEFT JOIN best_practice_healthcare_dw.doc_hospital_dim h
ON l.hospital_id = h.hospital_id;
Cardiology has the most abnormal results (5), with an average turnaround time of 111 minutes; ICU is the fastest (30 minutes).
DWS Layer: Department Monthly Performance Aggregation
The DWS layer aggregates by month × hospital × department dimensions to generate department operational performance metrics.
Create Table
CREATE DYNAMIC TABLE best_practice_healthcare_dw.doc_dws_dept_monthly
REFRESH INTERVAL 60 MINUTE VCLUSTER DEFAULT
AS
SELECT
DATE_TRUNC('month', admission_date) AS admission_month,
hospital_id,
hospital_name,
department,
major_diagnostic_category,
COUNT(*) AS total_admissions,
COUNT(DISTINCT patient_id) AS unique_patients,
ROUND(AVG(los_days), 2) AS avg_los_days,
MAX(los_days) AS max_los_days,
SUM(CASE WHEN is_long_stay = 1 THEN 1 ELSE 0 END) AS long_stay_count,
ROUND(SUM(CASE WHEN is_long_stay = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS long_stay_pct,
ROUND(SUM(total_charges), 2) AS total_charges,
ROUND(AVG(total_charges), 2) AS avg_charges,
SUM(CASE WHEN admission_type = 'Emergency' THEN 1 ELSE 0 END) AS emergency_count,
SUM(CASE WHEN payer_type = 'Medicare' THEN 1 ELSE 0 END) AS medicare_count,
SUM(CASE WHEN payer_type = 'Medicaid' THEN 1 ELSE 0 END) AS medicaid_count,
SUM(CASE WHEN payer_type = 'Commercial' THEN 1 ELSE 0 END) AS commercial_count
FROM best_practice_healthcare_dw.doc_dwd_patient_visits
GROUP BY 1, 2, 3, 4, 5;
Query January department performance (sorted by total charges descending):
SELECT
department, total_admissions, avg_los_days, long_stay_pct,
total_charges, emergency_count, medicare_count
FROM best_practice_healthcare_dw.doc_dws_dept_monthly
WHERE admission_month = '2025-01-01'
ORDER BY total_charges DESC
LIMIT 6;
ICU and Neurology have the highest average charges per case; both departments also have 100% emergency admission rates, reflecting that high-complexity cases are concentrated in these departments.
Distribution by major diagnostic category (full quarter):
SELECT
major_diagnostic_category,
COUNT(*) AS admission_count,
ROUND(AVG(los_days), 2) AS avg_los,
ROUND(AVG(total_charges), 0) AS avg_charges
FROM best_practice_healthcare_dw.doc_dwd_patient_visits
GROUP BY major_diagnostic_category
ORDER BY admission_count DESC
LIMIT 8;
Circulatory and Nervous System diseases tied for the highest admission volume. Nervous System diseases have the highest average hospital charges ($62,638); Respiratory diseases have the longest average hospital stay (6 days).
Analysis by payer mix:
SELECT
payer_type,
COUNT(*) AS admission_count,
ROUND(AVG(los_days), 2) AS avg_los,
ROUND(SUM(total_charges), 0) AS total_revenue,
ROUND(AVG(total_charges), 0) AS avg_charges
FROM best_practice_healthcare_dw.doc_dwd_patient_visits
GROUP BY payer_type
ORDER BY admission_count DESC;
Medicare patients account for 50% of total admissions, but their average length of stay (6.2 days) and average charges ($63,544) are far higher than other payers, consistent with the clinical characteristics of this population's older age and more complex conditions.
ADS Layer: Hospital-Level KPI Executive Dashboard
The ADS layer further aggregates to the hospital dimension to generate the core KPIs needed for the hospital executive management dashboard.
Create Table
CREATE DYNAMIC TABLE best_practice_healthcare_dw.doc_ads_hospital_kpi
REFRESH INTERVAL 60 MINUTE VCLUSTER DEFAULT
AS
SELECT
hospital_id,
hospital_name,
hospital_county,
hospital_type,
is_teaching,
COUNT(*) AS total_admissions,
COUNT(DISTINCT patient_id) AS unique_patients,
ROUND(AVG(los_days), 2) AS avg_los_days,
SUM(CASE WHEN is_long_stay = 1 THEN 1 ELSE 0 END) AS long_stay_count,
ROUND(SUM(CASE WHEN is_long_stay = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS long_stay_rate_pct,
ROUND(SUM(total_charges), 2) AS total_revenue,
ROUND(AVG(total_charges), 2) AS avg_revenue_per_admission,
SUM(CASE WHEN admission_type = 'Emergency' THEN 1 ELSE 0 END) AS emergency_admissions,
ROUND(SUM(CASE WHEN admission_type = 'Emergency' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS emergency_rate_pct,
SUM(CASE WHEN discharge_category = 'Extended Care' THEN 1 ELSE 0 END) AS snf_discharges,
SUM(CASE WHEN payer_type = 'Medicare' THEN 1 ELSE 0 END) AS medicare_cases,
SUM(CASE WHEN payer_type = 'Medicaid' THEN 1 ELSE 0 END) AS medicaid_cases,
SUM(CASE WHEN payer_type = 'Commercial' THEN 1 ELSE 0 END) AS commercial_cases,
COUNT(DISTINCT department) AS active_departments
FROM best_practice_healthcare_dw.doc_dwd_patient_visits
GROUP BY hospital_id, hospital_name, hospital_county, hospital_type, is_teaching;
Cedars-Sinai has the highest total revenue ($447,100) and also the highest emergency admission rate (75%); Stanford Health Care has a 16.7% long-stay rate (los > 7 days), reflecting its higher proportion of critically ill cases.
Quarterly trend analysis by month:
SELECT
CASE EXTRACT(month FROM admission_date)
WHEN 1 THEN 'January' WHEN 2 THEN 'February' WHEN 3 THEN 'March'
END AS month_name,
COUNT(*) AS total_admissions,
ROUND(AVG(los_days), 2) AS avg_los,
ROUND(SUM(total_charges), 0) AS total_revenue,
SUM(CASE WHEN admission_type = 'Emergency' THEN 1 ELSE 0 END) AS emergency_count
FROM best_practice_healthcare_dw.doc_dwd_patient_visits
GROUP BY EXTRACT(month FROM admission_date),
CASE EXTRACT(month FROM admission_date)
WHEN 1 THEN 'January' WHEN 2 THEN 'February' WHEN 3 THEN 'March'
END
ORDER BY EXTRACT(month FROM admission_date);
January had the highest admission volume (25 cases), consistent with California's winter peak of respiratory illnesses and flu season. February had a slightly higher average length of stay (4.33 days), suggesting the average case complexity in February was higher.
RBAC: Tiered Permission Configuration
Different users in the healthcare data warehouse have significantly varying data access needs and permission levels; custom RBAC roles provide layered access control.
Create Custom Roles
-- Clinical staff: can view patient visit events (DWD layer), but patient PII fields are controlled by Column Masking
CREATE ROLE IF NOT EXISTS healthcare_clinical_viewer
COMMENT 'Clinical staff - view patient clinical data with PII masking';
-- Operations management: can only access department-level aggregated data (DWS/ADS layers), no patient-level records
CREATE ROLE IF NOT EXISTS healthcare_mgmt_analyst
COMMENT 'Management - view aggregate KPIs only, no patient-level access';
-- BI analysts: can only access ADS layer KPI tables for reports and dashboards
CREATE ROLE IF NOT EXISTS healthcare_bi_analyst
COMMENT 'BI team - ADS layer only, read-only for reporting';
Grant Schema Access Permissions
-- Clinical staff: grant schema metadata read permission (actual table-level permissions are controlled separately on each table)
GRANT READ METADATA ON SCHEMA best_practice_healthcare_dw
TO ROLE healthcare_clinical_viewer;
-- Management and BI analysts also only granted metadata read
GRANT READ METADATA ON SCHEMA best_practice_healthcare_dw
TO ROLE healthcare_mgmt_analyst;
GRANT READ METADATA ON SCHEMA best_practice_healthcare_dw
TO ROLE healthcare_bi_analyst;
Permission Tier Summary
Role
Accessible Layers
Patient PII Visible
Typical Users
workspace_admin
ODS / DWD / DWS / ADS
Full data
DBA, data platform administrators
healthcare_clinical_viewer
DWD visit events
patient_id masked, diagnosis RESTRICTED
Clinicians, nurses
healthcare_mgmt_analyst
DWS / ADS layers
No patient-level fields
Department heads, operations management
healthcare_bi_analyst
ADS layer KPIs only
No patient-level fields
BI engineers, report developers
⚠️ Note: After RBAC role authorization, the roles must be assigned to specific users (GRANT ROLE <role_name> TO USER <username>); roles themselves cannot log into the system directly.
Time Travel: Compliance Auditing and Medical Insurance Reconciliation
Healthcare industry compliance requirements make data change traceability mandatory. Singdata Lakehouse Time Travel supports querying any historical version through DESC HISTORY and TIMESTAMP AS OF, applicable to medical insurance reconciliation, regulatory audits, and recovery from erroneous data operations.
View Historical Versions
DESC HISTORY best_practice_healthcare_dw.doc_ods_admissions;
Version 2 had only 25 records (the first batch of admission data); versions 4 and 5 are ALTER operations (binding Column Masking).
Timestamp-Based Historical Data Query
Medical insurance reconciliation scenario: query the state of the raw data at the time of submitting a reconciliation report, confirming the number of admission records at that time.
-- Query the data snapshot at version 2 (2026-06-06 13:28:29)
SELECT COUNT(*) AS row_count
FROM best_practice_healthcare_dw.doc_ods_admissions
TIMESTAMP AS OF '2026-06-06T13:28:29.718';
row_count
---------
25
The current table has 50 records, but through TIMESTAMP AS OF you can precisely return to the state at version 2, confirming that only 25 records existed at that time. This is applicable to scenarios where medical insurance auditors require "a data snapshot at the time of a specific claim submission."
Data Recovery Scenario
Recover data after an erroneous operation using Time Travel:
-- Restore accidentally deleted data to a specified point in time
RESTORE TABLE best_practice_healthcare_dw.doc_ods_admissions
TO TIMESTAMP AS OF '2026-06-06T13:28:29.718';
⚠️ Note: RESTORE TABLE is an irreversible write operation that rolls the table back to the specified historical version and overwrites the current data. It is recommended to first verify the data is correct at the target version using TIMESTAMP AS OF before executing the restore. Time Travel retains historical versions for 7 days by default; versions beyond the retention period cannot be recovered.
Notes
Column Masking is a preview feature: Contact technical support to activate it; confirm activation before using in production. Masking policies take effect on all query paths (including indirect reads through Dynamic Tables) after binding.
Dynamic Table and upstream ODS are decoupled: The Column Masking behavior of DWD/DWS/ADS layer Dynamic Tables is determined by the ODS table's policy; masking does not need to be re-bound on DWD tables.
RBAC role authorization follows the least privilege principle: Assign roles by department or function; avoid granting high-privilege roles like workspace_admin or workspace_dev to ordinary business users.
Time Travel retention period: Defaults to 7 days; after that, historical versions are cleaned up. For scenarios with long-term audit requirements such as medical insurance reconciliation, it is recommended to periodically archive key versions to separate tables via snapshots (CTAS).
HIS offline sync full/incremental strategy: For the first sync, it is recommended to run full sync during off-peak hours (e.g., 02:00 AM), then switch to incremental mode to avoid excessive compute resource consumption from full sync as data volumes grow.
Related Documentation
Dynamic Table — Dynamic Table creation syntax and refresh mechanism