HR Employee Lifecycle Data Warehouse Best Practices

Integrate HRIS system data, performance evaluations, and attendance records to build an analytics data warehouse covering the full employee lifecycle from onboarding to offboarding. Outputs include attrition risk predictions, compensation equity analysis, and organizational effectiveness diagnostic reports. Using the Kaggle HR Analytics dataset (1,470 employee records) as the foundation, this guide demonstrates the end-to-end build process: ODS → DWD → DWS → ADS four-layer architecture, covering Column Masking (compensation field masking), Dynamic Tables (automatic monthly metric aggregation), and window functions (LEAD/LAG for promotion analysis).


Overview

The core challenges of an HR data warehouse are that the data is highly sensitive (compensation, performance, attrition reasons) while also needing to support different roles' analytical needs (HR BP sees all data, employees see their own, analysts see masked data). Singdata Lakehouse addresses these challenges through the following combination:

ProblemSolution
Compensation and performance fields are highly sensitive; different roles see different dataColumn Masking bound to columns; non-privileged users automatically receive masked values
ODS → DWD → ADS multi-layer aggregation requires automatic monthly refreshDynamic Table with declarative SQL; system automatically maintains upstream/downstream dependency chains
Need to identify employees with stalled promotions and compensation below peer benchmarksWindow functions RANK / AVG OVER / LAG, computed directly in the DWD layer
Refresh scheduling needs data quality rules and alerts attachedLakehouse Studio Tasks manage scheduling; alerts and quality checks can be attached to the same task
High-cardinality employee ID with frequent point lookupsAdd BloomFilter Index on demand to accelerate department filtering

SQL Commands Used

Command / FunctionPurposeNotes
CREATE TABLECreate ODS raw employee tableRegular static table, upstream to Dynamic Tables
CREATE FUNCTIONCreate compensation masking UDF mask_salaryUsed for Column Masking policy binding
ALTER TABLE ... CHANGE COLUMN ... SET MASKBind Column Masking to monthly_incomeNon-privileged users' queries automatically return -1
CREATE DYNAMIC TABLECreate DWD / DWS / ADS layer incremental computation tablesSystem automatically identifies upstream changes and refreshes incrementally
REFRESH DYNAMIC TABLEManually trigger one refreshUsed during initial build or debugging
AVG() OVER (PARTITION BY ...)Calculate average compensation for peers in the same department and levelUsed for compensation equity analysis
RANK() OVER (ORDER BY ...)Calculate employee's compensation rank within departmentUsed for compensation competitiveness report
LAG()Calculate relative change in promotion intervalsIdentifies employees with stalled promotions

Prerequisites

All examples in this guide run under the best_practice_hr_analytics schema.

CREATE SCHEMA IF NOT EXISTS best_practice_hr_analytics;

The dataset is from Kaggle HR Analytics Case Study (Apache 2.0), with 1,470 records and 35 columns covering employee age, department, level, annual salary, performance rating, attrition label, and other core fields. This guide uses the first 40 rows as the demonstration dataset.


ODS Layer: Raw Employee Data

Create Table

CREATE TABLE IF NOT EXISTS best_practice_hr_analytics.doc_ods_employees ( emp_id INT, age INT, attrition STRING, -- 'Yes'/'No', whether the employee has left business_travel STRING, daily_rate INT, department STRING, distance_from_home INT, education INT, -- 1=Below College ... 5=Doctor education_field STRING, employee_number INT, env_satisfaction INT, -- 1-4 gender STRING, hourly_rate INT, job_involvement INT, -- 1-4 job_level INT, -- 1=Entry ... 5=C-Level job_role STRING, job_satisfaction INT, -- 1=Low 4=Very High marital_status STRING, monthly_income INT, -- Sensitive field, will be bound to Column Masking monthly_rate INT, num_companies_worked INT, overtime STRING, -- 'Yes'/'No' pct_salary_hike INT, -- Last salary increase percentage (%) performance_rating INT, -- 3=Excellent 4=Outstanding relationship_satisfaction INT, -- 1-4 stock_option_level INT, total_working_years INT, training_times_last_year INT, work_life_balance INT, -- 1-4 years_at_company INT, years_in_current_role INT, years_since_last_promo INT, years_with_curr_mgr INT );

Load Data

This guide loads the first 40 rows from a local CSV file (downloaded from Kaggle) via INSERT. In production, use MySQL batch sync or COPY INTO from Volume to import the full dataset:

kaggle datasets download -d bhanupratapbiswas/hr-analytics-case-study \ --unzip -p /tmp/hr_analytics/

Load data from a local CSV file (recommended):

-- Step 1: Upload the local CSV file to User Volume via SQL PUT PUT '/path/to/your/data.csv' TO USER VOLUME FILE 'data.csv';

-- Step 2: COPY INTO the table from User Volume COPY INTO best_practice_hr_analytics.doc_ods_employees FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('data.csv');

You can also insert small batches of test data inline (no CSV file needed):

-- Load the first 40 rows of demo data via INSERT INTO (actual execution, 40 rows total) INSERT INTO best_practice_hr_analytics.doc_ods_employees VALUES (1,41,'Yes','Travel_Rarely',1102,'Sales',1,2,'Life Sciences',1,2, 'Female',94,3,2,'Sales Executive',4,'Single',5993,19479,8,'Yes',11,3,1,0,8,0,1,6,4,0,5), (2,49,'No','Travel_Frequently',279,'Research & Development',8,1,'Life Sciences',2,3, 'Male',61,2,2,'Research Scientist',2,'Married',5130,24907,1,'No',23,4,4,1,10,3,3,10,7,1,7), (3,37,'Yes','Travel_Rarely',1373,'Research & Development',2,2,'Other',4,4, 'Male',92,2,1,'Laboratory Technician',3,'Single',2090,2396,6,'Yes',15,3,2,0,7,3,3,0,0,0,0), (4,33,'No','Travel_Frequently',1392,'Research & Development',3,4,'Life Sciences',5,4, 'Female',56,3,1,'Research Scientist',3,'Married',2909,23159,1,'Yes',11,3,3,0,8,3,3,8,7,3,0), (5,27,'No','Travel_Rarely',591,'Research & Development',2,1,'Medical',7,1, 'Male',40,3,1,'Laboratory Technician',2,'Married',3468,16632,9,'No',12,3,4,1,6,3,3,2,2,2,2), (6,32,'No','Travel_Frequently',1005,'Research & Development',2,2,'Life Sciences',8,4, 'Male',79,3,1,'Laboratory Technician',4,'Single',3068,11864,0,'No',13,3,3,0,8,2,2,7,7,3,6), (7,59,'No','Travel_Rarely',1324,'Research & Development',3,3,'Medical',10,3, 'Female',81,4,1,'Laboratory Technician',1,'Married',2670,9964,4,'Yes',20,4,1,3,12,3,2,1,0,0,0), (8,30,'No','Travel_Rarely',1358,'Research & Development',24,1,'Life Sciences',11,4, 'Male',67,3,1,'Laboratory Technician',3,'Divorced',2693,13335,1,'No',22,4,2,1,1,2,3,1,0,0,0), (9,38,'No','Travel_Frequently',216,'Research & Development',23,3,'Life Sciences',12,4, 'Male',44,2,3,'Manufacturing Director',3,'Single',9526,8787,0,'No',21,4,2,0,10,2,3,9,7,1,8), (10,36,'No','Travel_Rarely',1299,'Research & Development',27,3,'Medical',13,3, 'Male',94,3,2,'Healthcare Representative',3,'Married',5237,16577,6,'No',13,3,2,2,17,3,2,7,7,7,7), (11,35,'No','Travel_Rarely',809,'Research & Development',16,3,'Medical',14,1, 'Male',84,4,1,'Laboratory Technician',2,'Married',2426,16479,0,'No',13,3,3,1,6,5,3,5,4,0,3), (12,29,'No','Travel_Rarely',153,'Research & Development',15,2,'Life Sciences',15,4, 'Female',49,2,2,'Laboratory Technician',3,'Single',4193,12682,0,'Yes',12,3,4,0,10,3,3,9,5,0,8), (13,31,'No','Travel_Rarely',670,'Research & Development',26,1,'Life Sciences',16,1, 'Male',31,3,1,'Research Scientist',3,'Divorced',2911,15170,1,'No',17,3,4,1,5,1,2,5,2,4,3), (14,34,'No','Travel_Rarely',1346,'Research & Development',19,2,'Medical',18,2, 'Male',93,3,1,'Laboratory Technician',4,'Divorced',2661,8758,0,'No',11,3,3,1,3,2,3,2,2,1,2), (15,28,'Yes','Travel_Rarely',103,'Research & Development',24,3,'Life Sciences',19,3, 'Male',50,2,1,'Laboratory Technician',3,'Single',2028,12947,5,'Yes',14,3,2,0,6,4,3,4,2,0,3), (16,29,'No','Travel_Rarely',1389,'Research & Development',21,4,'Life Sciences',20,2, 'Female',51,4,3,'Manufacturing Director',1,'Divorced',9980,10195,1,'No',11,3,3,1,10,1,3,10,9,8,8), (17,32,'No','Travel_Rarely',334,'Research & Development',5,2,'Life Sciences',21,1, 'Male',80,4,1,'Research Scientist',2,'Divorced',3298,15053,0,'Yes',12,3,4,2,7,5,2,6,2,0,5), (18,22,'No','Non-Travel',1123,'Research & Development',16,2,'Medical',22,4, 'Male',96,4,1,'Laboratory Technician',4,'Divorced',2935,7324,1,'Yes',13,3,2,2,1,2,2,1,0,0,0), (19,53,'No','Travel_Rarely',1219,'Sales',2,4,'Life Sciences',23,1, 'Female',78,2,4,'Manager',4,'Married',15427,22021,2,'No',16,3,3,0,31,3,3,25,8,3,7), (20,38,'No','Travel_Rarely',371,'Research & Development',2,3,'Life Sciences',24,4, 'Male',45,3,1,'Research Scientist',4,'Single',3944,4306,5,'Yes',11,3,3,0,6,3,3,3,2,1,2), (21,24,'No','Non-Travel',673,'Research & Development',11,2,'Other',26,1, 'Female',96,4,2,'Manufacturing Director',3,'Divorced',4011,8232,0,'No',18,3,4,1,5,5,2,4,2,1,3), (22,36,'Yes','Travel_Rarely',1218,'Sales',9,4,'Life Sciences',27,3, 'Male',82,2,1,'Sales Representative',1,'Single',3407,6986,7,'No',23,4,2,0,10,4,3,5,3,0,3), (23,34,'No','Travel_Rarely',419,'Research & Development',7,4,'Life Sciences',28,1, 'Female',53,3,3,'Research Director',2,'Single',11994,21293,0,'No',11,3,3,0,13,4,3,12,6,2,11), (24,21,'No','Travel_Rarely',391,'Research & Development',15,2,'Life Sciences',30,3, 'Male',96,3,1,'Research Scientist',4,'Single',1232,19281,1,'No',14,3,4,0,0,6,3,0,0,0,0), (25,34,'Yes','Travel_Rarely',699,'Research & Development',6,1,'Medical',31,2, 'Male',83,3,1,'Research Scientist',1,'Single',2960,17102,2,'No',11,3,3,0,8,2,3,4,2,1,3), (26,53,'No','Travel_Rarely',1282,'Research & Development',5,3,'Other',32,3, 'Female',58,3,5,'Manager',3,'Divorced',19094,10735,4,'No',11,3,4,1,26,3,2,14,13,4,8), (27,32,'Yes','Travel_Frequently',1125,'Research & Development',16,1,'Life Sciences',33,2, 'Female',72,1,1,'Research Scientist',1,'Single',3919,4681,1,'Yes',22,4,2,0,10,5,3,10,2,6,7), (28,42,'No','Travel_Rarely',691,'Sales',8,4,'Marketing',35,3, 'Male',48,3,2,'Sales Executive',2,'Married',6825,21173,0,'No',11,3,4,1,10,2,3,9,7,4,2), (29,44,'No','Travel_Rarely',477,'Research & Development',7,4,'Medical',36,1, 'Female',42,2,3,'Healthcare Representative',4,'Married',10248,2094,3,'No',14,3,4,1,24,4,3,22,6,5,17), (30,46,'No','Travel_Rarely',705,'Sales',2,4,'Marketing',38,2, 'Female',83,3,5,'Manager',1,'Single',18947,22822,3,'No',12,3,4,0,22,2,2,2,2,2,1), (31,33,'No','Travel_Rarely',924,'Research & Development',2,3,'Medical',39,3, 'Male',78,3,1,'Laboratory Technician',4,'Single',2496,6670,4,'No',11,3,4,0,7,3,3,1,1,0,0), (32,44,'No','Travel_Rarely',1459,'Research & Development',10,4,'Other',40,4, 'Male',41,3,2,'Healthcare Representative',4,'Married',6465,19121,2,'Yes',13,3,4,0,9,5,4,4,2,1,3), (33,30,'No','Travel_Rarely',125,'Research & Development',9,2,'Medical',41,4, 'Male',83,2,1,'Laboratory Technician',3,'Single',2206,16117,1,'No',13,3,4,0,10,5,3,10,0,1,8), (34,39,'Yes','Travel_Rarely',895,'Sales',5,3,'Technical Degree',42,4, 'Male',56,3,2,'Sales Representative',4,'Married',2086,3335,3,'No',14,3,3,1,19,6,4,1,0,0,0), (35,24,'Yes','Travel_Rarely',813,'Research & Development',1,3,'Medical',45,2, 'Male',61,3,1,'Research Scientist',4,'Married',2293,3020,2,'Yes',16,3,1,1,6,2,2,2,0,2,0), (36,43,'No','Travel_Rarely',1273,'Research & Development',2,2,'Medical',46,4, 'Female',72,4,1,'Research Scientist',3,'Divorced',2645,21923,1,'No',12,3,4,2,6,3,2,5,3,1,4), (37,50,'Yes','Travel_Rarely',869,'Sales',3,2,'Marketing',47,1, 'Male',86,2,1,'Sales Representative',3,'Married',2683,3810,1,'Yes',14,3,3,0,3,2,3,3,2,0,2), (38,35,'No','Travel_Rarely',890,'Sales',2,3,'Marketing',49,4, 'Female',97,3,1,'Sales Representative',4,'Married',2014,9687,1,'No',13,3,1,0,2,3,3,2,2,2,2), (39,36,'No','Travel_Rarely',852,'Research & Development',5,4,'Life Sciences',51,2, 'Female',82,2,1,'Research Scientist',1,'Married',3419,13072,9,'Yes',14,3,4,1,6,3,4,1,1,0,0), (40,33,'No','Travel_Frequently',1141,'Sales',1,3,'Life Sciences',52,3, 'Female',42,4,2,'Sales Executive',1,'Married',5376,3193,2,'No',19,3,1,2,10,3,3,5,3,1,3) ;

Verify row count:

SELECT COUNT(*) AS ods_row_count FROM best_practice_hr_analytics.doc_ods_employees;

ods_row_count ------------- 40

Column Masking: Compensation Field De-identification

monthly_income is a highly sensitive field. Policy: HR administrators see the original value; other users' queries return -1.

-- Create masking function CREATE OR REPLACE FUNCTION best_practice_hr_analytics.mask_salary(salary INT) RETURNS INT AS CASE WHEN current_user() IN ('privileged_user') THEN salary -- replace with actual authorized username ELSE -1 END; -- Bind to monthly_income column ALTER TABLE best_practice_hr_analytics.doc_ods_employees CHANGE COLUMN monthly_income SET MASK best_practice_hr_analytics.mask_salary;

Verify masking (admin account can see original values):

SELECT emp_id, department, job_role, monthly_income FROM best_practice_hr_analytics.doc_ods_employees LIMIT 5;

emp_id | department | job_role | monthly_income -------+--------------------------+------------------------+--------------- 31 | Research & Development | Laboratory Technician | 2496 32 | Research & Development | Healthcare Representative | 6465 33 | Research & Development | Laboratory Technician | 2206 34 | Sales | Sales Representative | 2086 35 | Research & Development | Research Scientist | 2293


DWD Layer: Employee Event Timeline

The DWD layer derives three categories of analytical fields from ODS raw data: tenure bands (tenure_band), promotion stall flags (promotion_stalled_flag), and retention risk scores (retention_risk_score).

Create Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_hr_analytics.doc_dwd_employee_timeline AS SELECT emp_id, department, job_role, job_level, age, gender, marital_status, education, education_field, years_at_company, years_in_current_role, years_since_last_promo, years_with_curr_mgr, total_working_years, monthly_income, pct_salary_hike, performance_rating, job_satisfaction, work_life_balance, env_satisfaction, overtime, business_travel, attrition, -- Tenure band for grouped analysis CASE WHEN years_at_company <= 1 THEN 'New' WHEN years_at_company <= 3 THEN 'Junior' WHEN years_at_company <= 7 THEN 'Mid' WHEN years_at_company <= 15 THEN 'Senior' ELSE 'Veteran' END AS tenure_band, -- Promotion stall flag: no promotion for 2+ years with performance rating >= 3 CASE WHEN years_since_last_promo >= 2 AND performance_rating >= 3 THEN 1 ELSE 0 END AS promotion_stalled_flag, -- Retention risk score (0.0 ~ 1.0, higher is more at risk) ROUND( CASE WHEN overtime = 'Yes' THEN 0.25 ELSE 0.0 END + CASE WHEN job_satisfaction <= 2 THEN 0.25 ELSE 0.0 END + CASE WHEN work_life_balance <= 2 THEN 0.20 ELSE 0.0 END + CASE WHEN years_since_last_promo >= 3 THEN 0.15 ELSE 0.0 END + CASE WHEN pct_salary_hike <= 12 THEN 0.15 ELSE 0.0 END , 2) AS retention_risk_score FROM best_practice_hr_analytics.doc_ods_employees;

Configure Studio Refresh Task

# Create task under Studio best_practices/hr_analytics/ path cz-cli task create-folder hr_analytics --parent 186117 -p skill_test # Returns folder_id: 186127 cz-cli task create refresh_hr_dwd_timeline --type SQL --folder 186127 -p skill_test cz-cli task save-content refresh_hr_dwd_timeline \ --content "REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_dwd_employee_timeline;" \ -p skill_test cz-cli task save-cron refresh_hr_dwd_timeline --cron "0 1 * * *" -p skill_test

Task path: best_practices/hr_analytics/refresh_hr_dwd_timeline, triggered daily at 01:00. Additional configurations can be attached to this task: data row count alerts (notification when DWD row count drops sharply), data quality rules (retention_risk_score must not be all zeros), etc.

Manually Trigger First Refresh

REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_dwd_employee_timeline; SELECT COUNT(*) AS dwd_count FROM best_practice_hr_analytics.doc_dwd_employee_timeline;

dwd_count --------- 40

Attrition Risk Analysis by Tenure Band

Tenure bands help HR identify which stage has the highest employee attrition risk:

SELECT tenure_band, COUNT(*) AS emp_count, ROUND(AVG(retention_risk_score), 2) AS avg_risk, SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS attrition_cnt FROM best_practice_hr_analytics.doc_dwd_employee_timeline GROUP BY tenure_band ORDER BY avg_risk DESC;

tenure_band | emp_count | avg_risk | attrition_cnt ------------+-----------+----------+-------------- Senior | 10 | 0.39 | 1 Mid | 13 | 0.36 | 4 Junior | 7 | 0.32 | 2 New | 8 | 0.26 | 2 Veteran | 2 | 0.15 | 0

Result interpretation: Senior (7–15 years) and Mid (3–7 years) employees have the highest average retention risk at 0.39 and 0.36 respectively, suggesting these groups bear greater overtime and promotion stall pressure. Veteran (15+ years) employees have the lowest risk, likely having reached a stable career level.


DWS Layer: Department Headcount Metric Aggregation

The DWS layer aggregates DWD layer data at department granularity, outputting department-level attrition rates, average tenure, compensation benchmarks, and risk indices.

Create Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_hr_analytics.doc_dws_dept_headcount_metrics AS SELECT department, COUNT(*) AS headcount, SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS attrition_count, ROUND(SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS attrition_rate_pct, ROUND(AVG(years_at_company), 2) AS avg_tenure_years, ROUND(AVG(monthly_income), 0) AS avg_monthly_income, ROUND(AVG(performance_rating), 2) AS avg_performance, ROUND(AVG(job_satisfaction), 2) AS avg_job_satisfaction, ROUND(AVG(retention_risk_score), 2) AS avg_retention_risk, SUM(CASE WHEN overtime = 'Yes' THEN 1 ELSE 0 END) AS overtime_headcount, SUM(promotion_stalled_flag) AS promotion_stalled_count FROM best_practice_hr_analytics.doc_dwd_employee_timeline GROUP BY department;

cz-cli task create refresh_hr_dws_dept_metrics --type SQL --folder 186127 -p skill_test cz-cli task save-content refresh_hr_dws_dept_metrics \ --content "REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_dws_dept_headcount_metrics;" \ -p skill_test cz-cli task save-cron refresh_hr_dws_dept_metrics --cron "30 1 * * *" -p skill_test

Manually refresh and query:

REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_dws_dept_headcount_metrics; SELECT department, headcount, attrition_count, attrition_rate_pct, avg_tenure_years, avg_monthly_income, avg_performance, avg_retention_risk FROM best_practice_hr_analytics.doc_dws_dept_headcount_metrics ORDER BY attrition_rate_pct DESC;

department | headcount | attrition_count | attrition_rate_pct | avg_tenure_years | avg_monthly_income | avg_performance | avg_retention_risk ---------------------------+-----------+-----------------+--------------------+------------------+--------------------+-----------------+------------------- Sales | 9 | 4 | 44.44 | 6.44 | 6973 | 3.11 | 0.29 Research & Development | 31 | 5 | 16.13 | 5.77 | 4650 | 3.16 | 0.34

Result interpretation:

  • Sales department attrition rate (44.44%) is far higher than R&D (16.13%), but Sales average compensation (6973) is higher than R&D (4650), indicating compensation is not the main driver of Sales attrition. Looking at avg_retention_risk, the R&D team's overall risk average (0.34) is actually higher than Sales (0.29), possibly because overtime and promotion stalls are more pronounced in R&D.
  • Performance averages are similar across both departments (3.11 vs 3.16), indicating that those leaving are not low performers. HR should prioritize organizational culture and career development pathways.

ADS Layer: Attrition Risk Report and Compensation Analysis

Attrition Risk Report

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_hr_analytics.doc_ads_attrition_risk_report AS SELECT emp_id, department, job_role, job_level, tenure_band, years_at_company, years_since_last_promo, overtime, job_satisfaction, work_life_balance, performance_rating, monthly_income, pct_salary_hike, promotion_stalled_flag, retention_risk_score, attrition, -- Risk level CASE WHEN retention_risk_score >= 0.6 THEN 'HIGH' WHEN retention_risk_score >= 0.3 THEN 'MEDIUM' ELSE 'LOW' END AS risk_level, -- Primary attrition signal CASE WHEN overtime = 'Yes' AND job_satisfaction <= 2 THEN 'Overwork+LowSatisfaction' WHEN overtime = 'Yes' THEN 'Overwork' WHEN job_satisfaction <= 2 THEN 'LowSatisfaction' WHEN years_since_last_promo >= 3 THEN 'PromotionStalled' WHEN pct_salary_hike <= 12 THEN 'LowPayRaise' ELSE 'Normal' END AS primary_risk_signal FROM best_practice_hr_analytics.doc_dwd_employee_timeline;

cz-cli task create refresh_hr_ads_risk_report --type SQL --folder 186127 -p skill_test cz-cli task save-content refresh_hr_ads_risk_report \ --content "REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_ads_attrition_risk_report;" \ -p skill_test cz-cli task save-cron refresh_hr_ads_risk_report --cron "0 2 * * *" -p skill_test

View risk distribution after refresh:

REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_ads_attrition_risk_report; SELECT risk_level, COUNT(*) AS emp_count, ROUND(AVG(retention_risk_score), 2) AS avg_risk, SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS actual_attrition FROM best_practice_hr_analytics.doc_ads_attrition_risk_report GROUP BY risk_level ORDER BY avg_risk DESC;

risk_level | emp_count | avg_risk | actual_attrition -----------+-----------+----------+----------------- HIGH | 5 | 0.68 | 2 MEDIUM | 16 | 0.43 | 2 LOW | 19 | 0.15 | 5

View HIGH risk employee details:

SELECT emp_id, department, job_role, years_at_company, years_since_last_promo, monthly_income, pct_salary_hike, retention_risk_score, risk_level, primary_risk_signal FROM best_practice_hr_analytics.doc_ads_attrition_risk_report WHERE risk_level = 'HIGH' ORDER BY retention_risk_score DESC;

emp_id | department | job_role | years_at_company | years_since_last_promo | monthly_income | pct_salary_hike | retention_risk_score | risk_level | primary_risk_signal -------+--------------------------+----------------------+------------------+------------------------+----------------+-----------------+----------------------+------------+-------------------------- 17 | Research & Development | Research Scientist | 6 | 0 | 3298 | 12 | 0.85 | HIGH | Overwork+LowSatisfaction 7 | Research & Development | Laboratory Technician| 1 | 0 | 2670 | 20 | 0.70 | HIGH | Overwork+LowSatisfaction 27 | Research & Development | Research Scientist | 10 | 6 | 3919 | 22 | 0.65 | HIGH | Overwork+LowSatisfaction 30 | Sales | Manager | 2 | 2 | 18947 | 12 | 0.60 | HIGH | LowSatisfaction 1 | Sales | Sales Executive | 6 | 0 | 5993 | 11 | 0.60 | HIGH | Overwork

Result interpretation:

  • emp_id=17 (R&D Research Scientist) has the highest risk (0.85), triggered by both overtime and low job satisfaction, with a salary hike of only 12% — right at the LowPayRaise threshold. Priority interview and compensation adjustment are recommended.
  • emp_id=27 (R&D Research Scientist, 10-year tenure) has not been promoted in 6 consecutive years (years_since_last_promo=6). Although the salary hike is relatively reasonable (22%), career stagnation is the primary risk signal.
  • emp_id=30 (Sales Manager, high compensation of 18947) has high compensation but low job satisfaction, indicating non-monetary factors need attention for management retention.

Overtime × Satisfaction Cross Analysis

SELECT overtime, job_satisfaction, COUNT(*) AS emp_count, ROUND(AVG(retention_risk_score), 2) AS avg_risk, SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS attrition_cnt FROM best_practice_hr_analytics.doc_dwd_employee_timeline GROUP BY overtime, job_satisfaction ORDER BY avg_risk DESC LIMIT 8;

overtime | job_satisfaction | emp_count | avg_risk | attrition_cnt ---------+------------------+-----------+----------+-------------- Yes | 2 | 1 | 0.85 | 0 Yes | 1 | 3 | 0.62 | 1 Yes | 4 | 5 | 0.43 | 2 No | 1 | 5 | 0.41 | 2 No | 2 | 5 | 0.37 | 0 Yes | 3 | 5 | 0.34 | 3 No | 3 | 8 | 0.22 | 0 No | 4 | 8 | 0.12 | 1

Result interpretation: The combination of overtime=Yes + job_satisfaction <= 2 has a significantly higher average risk (0.62–0.85) than other combinations and is the highest-priority intervention dimension. Notably, the no-overtime but low-satisfaction group (overtime=No, satisfaction=1) has an average risk of 0.41, close to the overtime + medium-satisfaction group, indicating that the independent effect of job satisfaction cannot be ignored.


Compensation Equity Analysis (Window Functions)

The following queries do not build Dynamic Tables — they run directly on the DWD layer as ad-hoc analytical workloads.

Peer Compensation Equity Analysis

SELECT emp_id, department, job_role, job_level, monthly_income, years_at_company, pct_salary_hike, AVG(monthly_income) OVER (PARTITION BY department, job_level) AS dept_level_avg_income, ROUND( (monthly_income - AVG(monthly_income) OVER (PARTITION BY department, job_level)) / AVG(monthly_income) OVER (PARTITION BY department, job_level) * 100 , 1) AS income_vs_peer_pct, RANK() OVER (PARTITION BY department ORDER BY monthly_income DESC) AS income_rank_in_dept FROM best_practice_hr_analytics.doc_dwd_employee_timeline WHERE department = 'Sales' ORDER BY job_level DESC, monthly_income DESC;

emp_id | department | job_role | job_level | monthly_income | years_at_company | pct_salary_hike | dept_level_avg_income | income_vs_peer_pct | income_rank_in_dept -------+------------+-----------------------+-----------+----------------+------------------+-----------------+-----------------------+--------------------+-------------------- 30 | Sales | Manager | 5 | 18947 | 2 | 12 | 18947 | 0 | 1 19 | Sales | Manager | 4 | 15427 | 25 | 16 | 15427 | 0 | 2 28 | Sales | Sales Executive | 2 | 6825 | 9 | 11 | 5070 | 34.6 | 3 1 | Sales | Sales Executive | 2 | 5993 | 6 | 11 | 5070 | 18.2 | 4 40 | Sales | Sales Executive | 2 | 5376 | 5 | 19 | 5070 | 6.0 | 5 34 | Sales | Sales Representative | 2 | 2086 | 1 | 14 | 5070 | -58.9 | 8 22 | Sales | Sales Representative | 1 | 3407 | 5 | 23 | 2701 | 26.1 | 6 37 | Sales | Sales Representative | 1 | 2683 | 3 | 14 | 2701 | -0.7 | 7 38 | Sales | Sales Representative | 1 | 2014 | 2 | 13 | 2701 | -25.4 | 9

Result interpretation:

  • emp_id=34 (Sales Representative, job_level=2) has compensation 58.9% below the peer average with only 1 year of tenure — a low-compensation high-risk employee. Check whether the compensation setting at hiring was appropriate.
  • emp_id=28 (Sales Executive) has compensation 34.6% above the peer average. Given their 9-year tenure, this is a normal senior premium. However, with a salary hike of only 11%, dissatisfaction may accumulate if this continues.

Promotion Lag Analysis (LAG)

SELECT emp_id, department, job_role, job_level, years_at_company, years_since_last_promo, performance_rating, LAG(performance_rating) OVER ( PARTITION BY department ORDER BY years_at_company ) AS prev_emp_perf, years_since_last_promo - LAG(years_since_last_promo) OVER ( PARTITION BY department ORDER BY years_at_company ) AS promo_lag_delta FROM best_practice_hr_analytics.doc_dwd_employee_timeline WHERE department = 'Research & Development' ORDER BY years_since_last_promo DESC LIMIT 10;

emp_id | department | job_role | job_level | years_at_company | years_since_last_promo | performance_rating | prev_emp_perf | promo_lag_delta -------+--------------------------+-----------------------------+-----------+------------------+------------------------+--------------------+---------------+---------------- 16 | Research & Development | Manufacturing Director | 3 | 10 | 8 | 3 | 4 | 7 10 | Research & Development | Healthcare Representative | 2 | 7 | 7 | 3 | 3 | 4 27 | Research & Development | Research Scientist | 1 | 10 | 6 | 4 | 3 | -2 29 | Research & Development | Healthcare Representative | 3 | 22 | 5 | 3 | 3 | 1 13 | Research & Development | Research Scientist | 1 | 5 | 4 | 3 | 3 | 4 26 | Research & Development | Manager | 5 | 14 | 4 | 3 | 3 | 2

Result interpretation:

  • emp_id=16 (Manufacturing Director) has not been promoted for 8 years with a sustained performance rating of 3 (Excellent). promo_lag_delta=7 means their promotion wait time is 7 years longer than the previous employee with similar tenure in the department — a clear promotion anomaly.
  • emp_id=27 (Research Scientist, 10-year tenure, 6 consecutive years without promotion, performance rating 4 = Outstanding) is the most typical case of a "high-performer blocked from promotion." Already on the HIGH risk list, HR BP intervention for career path evaluation is needed.

Data Warehouse Object Summary

After completing the full build, the objects in the best_practice_hr_analytics schema:

SHOW TABLES IN best_practice_hr_analytics;

schema_name | table_name | is_dynamic -------------------------------+----------------------------------+----------- best_practice_hr_analytics | doc_ads_attrition_risk_report | true best_practice_hr_analytics | doc_dwd_employee_timeline | true best_practice_hr_analytics | doc_dws_dept_headcount_metrics | true best_practice_hr_analytics | doc_ods_employees | false

Studio task paths (best_practices/hr_analytics/):

Task NameRefresh TargetSchedule
refresh_hr_dwd_timelinedoc_dwd_employee_timelineDaily 01:00
refresh_hr_dws_dept_metricsdoc_dws_dept_headcount_metricsDaily 01:30
refresh_hr_ads_risk_reportdoc_ads_attrition_risk_reportDaily 02:00

Notes

  • Column Masking is transparently applied to Dynamic Tables: After the DWD layer inherits monthly_income from the ODS layer, non-privileged users will see the masked value (-1) for compensation fields in DWD / DWS / ADS. For compensation equity analysis, you must use an account authorized in the Column Masking policy (i.e., explicitly listed in the masking function's IN() list) to see unmasked original values.

  • Dynamic Tables do not write REFRESH INTERVAL: All Dynamic Table DDLs omit the REFRESH INTERVAL parameter. Refresh scheduling is unified via Lakehouse Studio Tasks. This allows attaching data quality rules (e.g., check that retention_risk_score is not all zeros) and alert notifications to the same task node.

  • Retention risk score is a heuristic model: retention_risk_score is based on stacked rules for overtime, satisfaction, promotion stalls, and salary hike. It is suitable for quickly screening high-risk groups but cannot replace professional statistical prediction models. In production, consider running machine learning models in ZettaPark Python Tasks and writing results back to the ADS layer.

  • Dynamic Table incremental refresh depends on ODS change tracking: The first REFRESH performs a full snapshot; subsequent incremental refreshes only process rows in the ODS layer that were added or changed since the last refresh. If the ODS layer uses INSERT OVERWRITE or a full rewrite, the Dynamic Table will fall back to full refresh, significantly increasing computation overhead.

  • Window semantics of promotion lag LAG analysis: LAG with PARTITION BY department ORDER BY years_at_company orders by tenure within the same department, comparing promotion stall differences between employees with similar tenure — not a historical time-series comparison of the same employee. For comparing a single employee across different time points, an employee historical snapshot table (with a time dimension) is needed.