Insurance Core Business Data Warehouse (Compliance Reporting) Best Practices
Integrate policy, claims, and customer data from insurance core systems into a multi-layer data warehouse that satisfies CBIRC (China Banking and Insurance Regulatory Commission) compliance reporting requirements. Using a Kaggle Insurance Claims dataset (100 auto insurance policies) as the foundation, this guide demonstrates the complete end-to-end build process: ODS → DWD → DWS → ADS, covering five key platform capabilities: Oracle/PostgreSQL batch sync, Dynamic Table periodic refresh, Time Travel historical reconciliation, Column Masking data de-identification, and RBAC fine-grained authorization.
Overview
Insurance core business data warehouses face the following typical challenges:
Problem
Solution
Core systems (Oracle/PG) require daily T+1 full sync to the data warehouse
Batch offline sync task; ODS layer preserves the original field structure
DWS/ADS layers require daily refresh to compute product profitability and regional risk
Dynamic Table without REFRESH INTERVAL; scheduled via Lakehouse Studio Tasks
Month-end reconciliation and regulatory historical lookback require precise restoration of data at any point in time
Time Travel with TIMESTAMP AS OF to query historical snapshots
Sensitive fields such as policyholder ID numbers, bank card numbers, and medical information need masking
Column Masking bound to columns, transparently applied to non-privileged users
Actuarial, claims, compliance, and IT ops departments require tiered permission management
RBAC with fine-grained role authorization, following the principle of least privilege
SQL Commands Used
Command / Function
Purpose
Notes
CREATE TABLE
Create ODS layer raw policy table
Regular table, upstream to Dynamic Tables
CREATE BLOOMFILTER INDEX
Create BloomFilter index on policy_id column
Accelerates point lookups on high-cardinality policy numbers
All examples in this guide run under the best_practice_insurance_dw schema.
CREATE SCHEMA IF NOT EXISTS best_practice_insurance_dw;
ODS Layer: Raw Policy Table
Create Table
The ODS layer fully preserves the source system fields without any transformation, making data traceability and regulatory auditing straightforward.
CREATE TABLE IF NOT EXISTS best_practice_insurance_dw.doc_ods_insurance_policy (
policy_id STRING, -- Policy number (primary key, high cardinality)
subscription_length DOUBLE, -- Policy term (years)
vehicle_age DOUBLE, -- Vehicle age (years)
customer_age INT, -- Customer age (sensitive field, requires masking)
region_code STRING, -- Region code
region_density INT, -- Regional population density
segment STRING, -- Insurance segment (A/B1/B2/C1/C2/Utility)
model STRING, -- Vehicle model
fuel_type STRING, -- Fuel type (Petrol/CNG/Diesel)
max_torque STRING,
max_power STRING,
engine_type STRING,
airbags INT,
is_esc STRING,
is_adjustable_steering STRING,
is_tpms STRING,
is_parking_sensors STRING,
is_parking_camera STRING,
rear_brakes_type STRING,
displacement INT,
cylinder INT,
transmission_type STRING,
steering_type STRING,
turning_radius DOUBLE,
length INT,
width INT,
gross_weight INT,
is_front_fog_lights STRING,
is_rear_window_wiper STRING,
is_rear_window_washer STRING,
is_rear_window_defogger STRING,
is_brake_assist STRING,
is_power_door_locks STRING,
is_central_locking STRING,
is_power_steering STRING,
is_driver_seat_height_adjustable STRING,
is_day_night_rear_view_mirror STRING,
is_ecw STRING,
is_speed_alert STRING,
ncap_rating INT, -- NCAP safety rating (0-5)
claim_status INT -- Claim status (0=no claim, 1=claimed)
);
Load data from a local CSV file (recommended):
-- Step 1: Upload the local CSV file to User Volume via SQL PUT
PUT '/path/to/your/data.csv' TO USER VOLUME FILE 'data.csv';
-- Step 2: COPY INTO the table from User Volume
COPY INTO best_practice_insurance_dw.doc_ods_insurance_policy
FROM USER VOLUME
USING csv
OPTIONS('header'='true', 'sep'=',', 'nullValue'='')
FILES ('data.csv');
You can also insert small batches of test data inline (no CSV file needed):
Verify the row count after loading:
SELECT COUNT(*) AS ods_row_count
FROM best_practice_insurance_dw.doc_ods_insurance_policy;
ods_row_count
-------------
100
Create BloomFilter Index
policy_id is the core business primary key — a high-cardinality column. Subsequent DWD/DWS/ADS layers will join by policy number. A BloomFilter Index is appropriate here.
CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_policy_id
ON TABLE doc_ods_insurance_policy (policy_id);
⚠️ Note: CREATE BLOOMFILTER INDEX requires the same schema context as the target table. Run USE SCHEMA best_practice_insurance_dw first or write the table name without schema prefix in SQL, otherwise you will get an "index and table must in the same schema" error.
Column Masking: Customer Age De-identification
Compliance requirement: customer_age is personal sensitive information. Actuarial and compliance administrators can view the exact age; other users can only view "age ranges" (precision reduced to multiples of 10).
-- Create masking function
CREATE OR REPLACE FUNCTION best_practice_insurance_dw.mask_customer_age(age INT)
RETURNS INT
AS CASE
WHEN current_user() IN ('privileged_user') THEN age -- replace with actual authorized username
ELSE CAST(FLOOR(CAST(age AS DOUBLE) / 10.0) * 10 AS INT)
END;
-- Bind to the customer_age column
ALTER TABLE best_practice_insurance_dw.doc_ods_insurance_policy
CHANGE COLUMN customer_age
SET MASK best_practice_insurance_dw.mask_customer_age;
💡 Tip: Replace 'privileged_user' with the actual username(s) that need access to plaintext data. Column Masking uses the current_user() function to match the current connection's username. All authorized usernames must be explicitly listed in the IN() list.
Verify binding (admin account sees original exact age):
SELECT policy_id, customer_age
FROM best_practice_insurance_dw.doc_ods_insurance_policy
WHERE policy_id IN ('POL007194', 'POL016745', 'POL045360')
ORDER BY policy_id;
⚠️ Note: Column Masking is transparently applied to all downstream queries (including Dynamic Tables) — when the DWD layer queries the ODS layer directly, non-privileged users see customer_age as the masked age range integer.
DWD Layer: Policy Lifecycle Standardization
The DWD layer performs three tasks on top of ODS raw policies:
Adds customer age segments (age_group), vehicle age segments (vehicle_age_group), and policy tenure segments (policy_tenure_group)
Marks high-risk records (is_high_risk): vehicle age ≥ 5 years or NCAP safety rating = 0
Retains all original fields for flexible aggregation by any dimension in DWS/ADS layers
Create Table
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_insurance_dw.doc_dwd_policy_lifecycle
AS
SELECT
policy_id,
subscription_length,
vehicle_age,
customer_age,
region_code,
region_density,
segment,
model,
fuel_type,
engine_type,
airbags,
displacement,
cylinder,
transmission_type,
ncap_rating,
claim_status,
-- Customer age segments
CASE
WHEN customer_age < 30 THEN 'Young (<30)'
WHEN customer_age < 45 THEN 'Middle (30-44)'
WHEN customer_age < 60 THEN 'Senior (45-59)'
ELSE 'Elderly (60+)'
END AS age_group,
-- Vehicle age segments
CASE
WHEN vehicle_age < 1 THEN 'New (<1yr)'
WHEN vehicle_age < 3 THEN 'Recent (1-3yr)'
WHEN vehicle_age < 5 THEN 'Mid (3-5yr)'
ELSE 'Old (5+yr)'
END AS vehicle_age_group,
-- Policy tenure segments
CASE
WHEN subscription_length < 3 THEN 'Short (<3yr)'
WHEN subscription_length < 7 THEN 'Medium (3-7yr)'
ELSE 'Long (7+yr)'
END AS policy_tenure_group,
-- High-risk flag: vehicle age over 5 years or NCAP rating = 0
CASE WHEN vehicle_age >= 5 OR ncap_rating = 0 THEN 1 ELSE 0 END AS is_high_risk
FROM best_practice_insurance_dw.doc_ods_insurance_policy;
⚠️ Note: Do not write REFRESH INTERVAL in the DDL. Refresh scheduling is managed via Lakehouse Studio Tasks (see the "Schedule Refresh Tasks" section below).
Manually trigger the first refresh:
REFRESH DYNAMIC TABLE best_practice_insurance_dw.doc_dwd_policy_lifecycle;
SELECT COUNT(*) AS dwd_count
FROM best_practice_insurance_dw.doc_dwd_policy_lifecycle;
dwd_count
---------
100
Analysis Example: Cross-tabulation of Age Range and Vehicle Age Claim Rates
SELECT
age_group,
vehicle_age_group,
COUNT(*) AS policy_count,
SUM(claim_status) AS claim_count
FROM best_practice_insurance_dw.doc_dwd_policy_lifecycle
GROUP BY age_group, vehicle_age_group
ORDER BY claim_count DESC;
Result interpretation: The middle-aged customer segment (30–44) is the largest group by policy count (60 policies) and also contributes the most claims (3). Notably, the "1–3 year" vehicle age range has more claims than the "5+ years" range, suggesting recently purchased vehicle owners have higher risk exposure, possibly related to driving habits of newer drivers.
DWS Layer: Product Type / Regional Aggregation
The DWS layer aggregates DWD data at segment × fuel_type × region_code × age_group granularity, outputting claim rates, high-risk rates, and other metrics for direct use by ADS layer compliance reports.
Create Table
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_insurance_dw.doc_dws_product_region_daily
AS
SELECT
segment,
fuel_type,
region_code,
age_group,
COUNT(*) AS policy_count,
SUM(claim_status) AS claim_count,
ROUND(SUM(claim_status) * 100.0 / COUNT(*), 4) AS claim_rate_pct,
ROUND(AVG(vehicle_age), 2) AS avg_vehicle_age,
ROUND(AVG(customer_age), 2) AS avg_customer_age,
ROUND(AVG(subscription_length), 2) AS avg_subscription_length,
SUM(is_high_risk) AS high_risk_count,
ROUND(SUM(is_high_risk) * 100.0 / COUNT(*), 4) AS high_risk_rate_pct
FROM best_practice_insurance_dw.doc_dwd_policy_lifecycle
GROUP BY segment, fuel_type, region_code, age_group;
Manually trigger the first refresh and verify:
REFRESH DYNAMIC TABLE best_practice_insurance_dw.doc_dws_product_region_daily;
SELECT COUNT(*) AS dws_count
FROM best_practice_insurance_dw.doc_dws_product_region_daily;
dws_count
---------
59
View segment-region combinations with claims:
SELECT segment, fuel_type, region_code,
policy_count, claim_count, claim_rate_pct, high_risk_count
FROM best_practice_insurance_dw.doc_dws_product_region_daily
WHERE claim_count > 0
ORDER BY claim_rate_pct DESC;
Result interpretation: Region C10 segment B2 (Petrol) has a 100% claim rate but only 1 policy — small sample noise. Region C2 segments C1 and C2 (Diesel) have high claim rates (50% and 25%) across multiple policies and deserve close attention. Region C3 segment A (CNG) has all 5 policies flagged as high-risk (ncap_rating = 0), indicating insufficient safety features in entry-level vehicles in that region.
ADS Layer: CBIRC Compliance Reports
Product Type Profitability Analysis Report
Aggregates by insurance segment and fuel type to output claim rates, regional coverage, and high-risk policy counts, meeting CBIRC product operations analysis report requirements.
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_insurance_dw.doc_ads_regulatory_report
AS
SELECT
segment AS product_type,
fuel_type,
COUNT(*) AS total_policies,
SUM(claim_status) AS total_claims,
ROUND(SUM(claim_status) * 100.0 / COUNT(*), 4) AS claim_rate_pct,
COUNT(DISTINCT region_code) AS covered_regions,
SUM(is_high_risk) AS high_risk_policies,
ROUND(SUM(is_high_risk) * 100.0 / COUNT(*), 4) AS high_risk_rate_pct,
ROUND(AVG(customer_age), 2) AS avg_customer_age,
ROUND(AVG(vehicle_age), 2) AS avg_vehicle_age,
ROUND(AVG(subscription_length), 2) AS avg_policy_tenure,
ROUND(AVG(ncap_rating), 2) AS avg_safety_rating
FROM best_practice_insurance_dw.doc_dwd_policy_lifecycle
GROUP BY segment, fuel_type;
REFRESH DYNAMIC TABLE best_practice_insurance_dw.doc_ads_regulatory_report;
SELECT product_type, fuel_type, total_policies, total_claims,
claim_rate_pct, covered_regions, high_risk_policies, avg_safety_rating
FROM best_practice_insurance_dw.doc_ads_regulatory_report
ORDER BY total_policies DESC;
A (Basic) CNG is the largest segment (27 policies), but 100% are flagged as high-risk (ncap_rating = 0), indicating this product type covers a large number of entry-level vehicles with insufficient safety features. Compliance departments need to take note.
C1 Diesel has the highest claim rate (20%), with an average NCAP score of 4, showing that high-end safety features do not significantly reduce the accident probability for C1-class vehicles.
B2 Petrol has the largest policy count, a claim rate of only 3.2%, and coverage of 11 regions — making it the most profitable product line.
Regional Risk Concentration Report
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_insurance_dw.doc_ads_region_claim_analysis
AS
SELECT
region_code,
COUNT(*) AS total_policies,
SUM(claim_status) AS total_claims,
ROUND(SUM(claim_status) * 100.0 / COUNT(*), 4) AS claim_rate_pct,
SUM(is_high_risk) AS high_risk_count,
ROUND(AVG(subscription_length), 2) AS avg_policy_tenure,
ROUND(AVG(vehicle_age), 2) AS avg_vehicle_age,
SUM(CASE WHEN ncap_rating >= 3 THEN 1 ELSE 0 END) AS high_safety_policies,
ROUND(AVG(ncap_rating), 2) AS avg_ncap_rating
FROM best_practice_insurance_dw.doc_dwd_policy_lifecycle
GROUP BY region_code;
REFRESH DYNAMIC TABLE best_practice_insurance_dw.doc_ads_region_claim_analysis;
SELECT region_code, total_policies, total_claims, claim_rate_pct,
high_risk_count, avg_ncap_rating
FROM best_practice_insurance_dw.doc_ads_region_claim_analysis
ORDER BY total_policies DESC
LIMIT 10;
Result interpretation: Region C2 has the most policies (21), a claim rate of 9.5%, and 6 high-risk policies — the highest overall risk exposure. Region C3 has 10 high-risk policies (62.5% of policies) with an average NCAP score of only 0.94, reflecting low safety features in vehicles in that region. Premium adjustments are recommended to manage risk.
Policy Tenure and Age Cross-Risk Analysis
This is one of the most common cross-dimensional analyses in compliance reports, used to identify high-risk customer segments:
SELECT
age_group,
policy_tenure_group,
COUNT(*) AS policy_count,
SUM(claim_status) AS claim_count,
ROUND(SUM(claim_status) * 100.0 / COUNT(*), 4) AS claim_rate_pct
FROM best_practice_insurance_dw.doc_dwd_policy_lifecycle
GROUP BY age_group, policy_tenure_group
ORDER BY claim_rate_pct DESC;
Result interpretation: Middle-aged (30–44) + long-term policies (7+ years) has the highest claim rate (11.5%), with 3 claims from 26 policies. The long policy tenure for this group suggests accumulated historical risk; it is recommended to re-assess vehicle condition at renewal.
Schedule Refresh Tasks (Lakehouse Studio Tasks)
Dynamic Table DDL does not include REFRESH INTERVAL. Instead, create SQL-type scheduled tasks in Lakehouse Studio. The benefit is that monitoring alerts and data quality rules can be added to the same task.
First create a Studio folder for the tasks and record the returned folder ID. Folder IDs differ across customer environments; the --folder parameter must be replaced with the ID returned in this run:
💡 Tip: The examples below use cz-cli (the Singdata Lakehouse command-line tool). If cz-cli is not installed, refer to the cz-cli setup guide. Alternatively, you can execute SQL in Lakehouse Studio → Development → SQL Editor and configure scheduled tasks on the Studio → Tasks page.
💡 Tip: After task creation, tasks are in draft state by default. Once configuration is complete, run cz-cli task deploy <task-name> -p skill_test to publish. In the Studio UI, you can add data quality checks (e.g., alert on abnormal claim rate fluctuations in the ADS layer) to the same task without creating a separate monitoring task.
Time Travel: Month-End Reconciliation and Regulatory Historical Lookback
Regulatory audit scenarios often require precise restoration of data state at a specific point in time. Singdata Lakehouse Time Travel supports querying at any historical point without additional backups.
View the ODS table version history:
DESC HISTORY best_practice_insurance_dw.doc_ods_insurance_policy;
Restore data state at a specific point in time (e.g., the historical snapshot after month-end T+1 sync):
-- Query the historical snapshot when 40 records had been written
SELECT COUNT(*) AS historical_count
FROM best_practice_insurance_dw.doc_ods_insurance_policy
TIMESTAMP AS OF '2026-06-06T22:48:26.136';
historical_count
----------------
40
Typical month-end reconciliation use case:
-- Compare claim rate differences between this month-end and last month-end
WITH current_month AS (
SELECT
segment,
SUM(claim_status) AS claims,
COUNT(*) AS policies
FROM best_practice_insurance_dw.doc_ods_insurance_policy
GROUP BY segment
),
last_month AS (
SELECT
segment,
SUM(claim_status) AS claims,
COUNT(*) AS policies
FROM best_practice_insurance_dw.doc_ods_insurance_policy
TIMESTAMP AS OF '2026-06-06T22:48:26.136' -- replace with last month-end timestamp
GROUP BY segment
)
SELECT
c.segment,
c.policies AS curr_policies,
c.claims AS curr_claims,
l.policies AS prev_policies,
l.claims AS prev_claims,
ROUND((c.claims * 1.0 / c.policies - l.claims * 1.0 / l.policies) * 100, 4) AS claim_rate_delta_pct
FROM current_month c
LEFT JOIN last_month l ON c.segment = l.segment
ORDER BY claim_rate_delta_pct DESC;
💡 Tip: TIMESTAMP AS OF only accepts literal constants and does not support dynamic expressions like NOW() - INTERVAL 1 MONTH. In production, the timestamp can be parameterized and passed in via Studio Task scheduling variables such as ${bizdate}.
RBAC: Fine-Grained Tiered Permissions
Different departments in insurance core business have significantly different data access requirements:
Role
Access Scope
Typical Requirements
Actuarial (role_actuarial)
ODS + DWD + DWS, can see customer_age (original value)
Rate actuarial, risk modeling
Claims (role_claims)
ODS + DWD, full policy fields visible, cannot modify
Claims processing, fraud investigation
Compliance (role_compliance)
ADS read-only, cannot see original customer_age
Regulatory report generation, compliance review
IT Ops (role_ops)
Schema-level DDL permissions, cannot query data
Table structure management, index maintenance
Example of creating roles and assigning permissions:
-- Create roles
CREATE ROLE IF NOT EXISTS role_compliance;
CREATE ROLE IF NOT EXISTS role_actuarial;
CREATE ROLE IF NOT EXISTS role_claims;
-- Compliance: only grant SELECT on the two ADS layer Dynamic Table reports
GRANT SELECT ON DYNAMIC TABLE best_practice_insurance_dw.doc_ads_regulatory_report TO ROLE role_compliance;
GRANT SELECT ON DYNAMIC TABLE best_practice_insurance_dw.doc_ads_region_claim_analysis TO ROLE role_compliance;
-- Actuarial: grant SELECT on ODS + DWD + DWS, also add to the whitelist for original customer_age visibility
GRANT SELECT ON TABLE best_practice_insurance_dw.doc_ods_insurance_policy TO ROLE role_actuarial;
GRANT SELECT ON TABLE best_practice_insurance_dw.doc_dwd_policy_lifecycle TO ROLE role_actuarial;
GRANT SELECT ON TABLE best_practice_insurance_dw.doc_dws_product_region_daily TO ROLE role_actuarial;
-- Claims: grant SELECT on ODS + DWD
GRANT SELECT ON TABLE best_practice_insurance_dw.doc_ods_insurance_policy TO ROLE role_claims;
GRANT SELECT ON TABLE best_practice_insurance_dw.doc_dwd_policy_lifecycle TO ROLE role_claims;
💡 Tip: The Column Masking whitelist is controlled via current_user() IN (...) in the masking function. To add a user with permission to see plaintext, only the function definition needs to be updated — no need to rebuild the masking policy.
Data Warehouse Object Summary
After completing the full build, the objects in the best_practice_insurance_dw schema:
Oracle / PostgreSQL (core system)
|
v T+1 batch full sync
doc_ods_insurance_policy (ODS)
BloomFilter Index (policy_id)
Column Masking (customer_age)
|
v Studio Task: refresh_dwd_policy_lifecycle (daily 02:00)
doc_dwd_policy_lifecycle (DWD · Dynamic Table)
age_group / vehicle_age_group / policy_tenure_group / is_high_risk
|
v Studio Task: refresh_dws_product_region_daily (daily 03:00)
doc_dws_product_region_daily (DWS · Dynamic Table)
segment x fuel_type x region_code · claim_rate_pct / high_risk_rate_pct
|
+--> doc_ads_regulatory_report (ADS · Dynamic Table)
| Product profitability report · CBIRC product operations analysis
|
+--> doc_ads_region_claim_analysis (ADS · Dynamic Table)
Regional risk concentration report · CBIRC regional risk monitoring
Studio Task: refresh_ads_regulatory_report (daily 04:00)
Notes
Dynamic Tables do not write REFRESH INTERVAL: All Dynamic Table DDLs in this guide omit REFRESH INTERVAL. Refresh scheduling is managed via Lakehouse Studio Tasks, allowing monitoring alerts and data quality rules to be added to the same task.
Partitioned Dynamic Tables must use static partitions: If ADS layer tables need monthly partitioning, explicitly declare PARTITION BY with the static partition option. Dynamic partition inference cannot be used.
Time Travel timestamp restrictions: TIMESTAMP AS OF only accepts literal constants and does not support dynamic expressions like NOW() - INTERVAL 1 MONTH. DESC HISTORY returns UTC times; be aware of local timezone offset (default UTC+8) when querying.
Column Masking is transparently applied to Dynamic Tables: When the DWD layer queries the ODS layer directly, non-privileged users see customer_age as the masked age range integer. The value stored in DWD is also the masked value. When actuarial staff need original precision, they should query the ODS layer directly with an authorized account.
BloomFilter Index does not auto-apply to existing data: CREATE BLOOMFILTER INDEX only takes effect for data written after creation. For existing data, the BloomFilter filter acceleration effect is limited. The BLOOMFILTER type does not support BUILD INDEX; rebuilding the table is required to cover existing data.
RBAC permission changes synchronization: Column Masking whitelist modifications take effect immediately, but GRANT/REVOKE permission changes may require users to re-login in some scenarios before taking effect.
⚠️ Pending manual verification: Column Masking currently uses current_user() to match usernames for authorization. All usernames with access to plaintext data must be individually added to the masking function's whitelist. If your Lakehouse version supports role-based dynamic evaluation (e.g., HAS_ROLE('role_name')), you can use roles instead of username lists for more flexible maintenance. Please contact Singdata technical support to confirm whether this function is supported in your current version.