Aggregate open data from multiple government departments including transportation, urban management, civil affairs, and public health to build a four-layer data warehouse supporting city operations KPI dashboards and cross-department joint analysis. Using NYC Open Data's 311 citizen complaint tickets, traffic accidents, and health inspection datasets as samples, this document provides an end-to-end demonstration of the complete build process from COPY INTO + Volume โ ODS โ DWD โ DWS โ ADS, covering the practical implementation of five key capabilities: External Schema (connecting to existing Hive data lakes), RBAC (multi-department data isolation), Dynamic Table (cross-department data fusion), Table Stream (complaint ticket status change capture), and Column Masking (PII field masking).
Overview
The core challenges of government data platforms are: diverse data sources, heterogeneous formats, complex permission boundaries, while simultaneously supporting cross-department joint analysis under the premise of protecting personal privacy.
All examples in this document run under the best_practice_smart_city schema.
CREATE SCHEMA IF NOT EXISTS best_practice_smart_city;
ODS Layer: Raw Data Tables for Each Department
The ODS layer stores data by source in isolation. This document demonstrates three data types: NYC 311 complaint tickets (citizen service hotline), traffic accidents (DOT/NYPD), and health inspections (Dept of Health), all corresponding to the actual data structure from NYC Open Data.
Create Tables
CREATE TABLE IF NOT EXISTS best_practice_smart_city.doc_ods_311_complaints (
complaint_id STRING,
created_date TIMESTAMP,
closed_date TIMESTAMP,
agency STRING,
agency_name STRING,
complaint_type STRING,
descriptor STRING,
location_type STRING,
incident_zip STRING,
incident_address STRING,
city STRING,
borough STRING,
latitude DOUBLE,
longitude DOUBLE,
status STRING,
resolution_desc STRING,
community_board STRING,
bbl STRING,
open_data_channel STRING,
load_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
);
CREATE TABLE IF NOT EXISTS best_practice_smart_city.doc_dim_district (
district_code STRING,
district_name STRING,
borough STRING,
city STRING,
population INT,
area_sqkm DOUBLE,
district_type STRING
);
Batch Import CSV Files via COPY INTO + Volume
Departments periodically submit data as CSV files stored in their dedicated object storage directories. First create a Volume pointing to the department submission directory, then use COPY INTO to batch import.
Step 1: Create a Storage Connection and Volume (Alibaba Cloud OSS example):
-- Create Storage Connection
CREATE STORAGE CONNECTION IF NOT EXISTS conn_city_data
TYPE = OSS
ACCESS_ID = '<your-access-id>'
ACCESS_KEY = '<your-access-key>'
ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com';
-- Create Volume for 311 complaint data
CREATE EXTERNAL VOLUME IF NOT EXISTS best_practice_smart_city.vol_311_complaints
TYPE = OSS
BUCKET = '<your-bucket>'
PATH = 'smart-city/311_complaints/'
CONNECTION = conn_city_data;
Step 2: Execute COPY INTO to load:
COPY INTO best_practice_smart_city.doc_ods_311_complaints
FROM VOLUME best_practice_smart_city.vol_311_complaints
USING csv
OPTIONS('header'='true', 'sep'=',', 'nullValue'='')
ON_ERROR = CONTINUE;
โ ๏ธ Note: ON_ERROR = CONTINUE skips rows with format errors and continues loading. In production, it is recommended to use ON_ERROR = ABORT instead and query SELECT * FROM load_history('doc_ods_311_complaints') after loading to view the list of skipped files, preventing silent data loss.
COPY INTO automatically records the metadata of loaded files; re-executing the same COPY statement will not re-import the same files. This is very useful for periodic department submission scenarios: just trigger COPY INTO periodically, and the system automatically processes only new files.
View loading history:
SELECT * FROM load_history('doc_ods_311_complaints') LIMIT 5;
External Schema: Connect to Existing Hive Data Lake
If a department already has a Hive data lake with historical data, External Schema can mount it directly without migration.
-- First create a Connection pointing to the Hive Metastore
CREATE CATALOG CONNECTION IF NOT EXISTS conn_hive_metastore
CATALOG_TYPE = HIVE
METASTORE_URI = 'thrift://<hive-metastore-host>:9083'
HDFS_DEFAULT_FS = 'hdfs://<namenode>:8020';
-- Mount the Hive database as an External Schema
CREATE EXTERNAL SCHEMA IF NOT EXISTS hive_civil_affairs
CONNECTION = conn_hive_metastore
DATABASE = 'civil_affairs_db';
After mounting, Hive tables can be queried directly:
-- Federated query: Hive historical data + Singdata Lakehouse new data
SELECT
ha.district_code,
ha.population_2020,
COUNT(c.complaint_id) AS recent_complaints
FROM hive_civil_affairs.district_population ha
LEFT JOIN best_practice_smart_city.doc_ods_311_complaints c
ON ha.zip_code = c.incident_zip
AND c.created_date >= CAST('2026-01-01' AS TIMESTAMP)
GROUP BY ha.district_code, ha.population_2020;
๐ก Tip: External Schema is read-only. Write operations still need to be performed in Singdata Lakehouse's own schema. This is suitable for a transitional architecture where Hive serves as historical archive and Singdata Lakehouse as the incremental processing layer.
Table Stream: Capturing Ticket Status Changes
311 complaint tickets undergo multiple status changes during processing (Open โ In Progress โ Closed), and each change needs to be recorded in an audit log for resolution timeliness analysis.
Create Table Stream
Create a STANDARD mode Stream on the complaint ticket table to capture all INSERT / UPDATE / DELETE operations:
CREATE TABLE STREAM IF NOT EXISTS best_practice_smart_city.doc_stream_complaint_changes
ON TABLE best_practice_smart_city.doc_ods_311_complaints
WITH PROPERTIES ('TABLE_STREAM_MODE' = 'STANDARD');
โ ๏ธ Note: Streams only record changes after creation; historical data that already existed before the stream was created will not appear in the stream. If historical snapshots are needed, take an archive snapshot via SELECT before creating the stream.
Simulate Ticket Status Changes
-- Ticket CMP002: Sewer Backup complaint resolved after second inspection
UPDATE best_practice_smart_city.doc_ods_311_complaints
SET status = 'Closed',
closed_date = CAST('2026-01-10 16:00:00' AS TIMESTAMP),
resolution_desc = 'Issue resolved after second inspection.'
WHERE complaint_id = 'CMP002';
-- Ticket CMP015: Elevator repair completed
UPDATE best_practice_smart_city.doc_ods_311_complaints
SET status = 'Closed',
closed_date = CAST('2026-01-11 10:00:00' AS TIMESTAMP),
resolution_desc = 'Elevator maintenance completed.'
WHERE complaint_id = 'CMP015';
In STANDARD mode, each UPDATE produces one UPDATE_BEFORE (before change) and one UPDATE_AFTER (after change) record. Consume changes and advance the offset using INSERT INTO ... SELECT FROM stream:
-- Keep only UPDATE_AFTER, write to audit log
INSERT INTO best_practice_smart_city.doc_dwd_complaint_audit_log
SELECT
complaint_id,
__change_type AS change_type,
__commit_version AS change_version,
__commit_timestamp AS change_time,
status AS new_status,
closed_date,
resolution_desc,
agency,
borough,
complaint_type
FROM best_practice_smart_city.doc_stream_complaint_changes
WHERE __change_type = 'UPDATE_AFTER';
Verify after consumption that the offset has advanced โ stream is now empty:
SELECT COUNT(*) AS remaining_changes
FROM best_practice_smart_city.doc_stream_complaint_changes;
remaining_changes
-----------------
0
View the audit log:
SELECT complaint_id, change_type, change_version, change_time,
new_status, borough, complaint_type
FROM best_practice_smart_city.doc_dwd_complaint_audit_log
ORDER BY change_time;
change_version increases monotonically, ensuring temporal completeness of the audit log. Creating this INSERT INTO ... SELECT FROM stream statement as a periodic task in Lakehouse Studio (recommended to schedule hourly) enables continuous tracking of ticket status changes.
๐ก Tip: The stream offset only advances after a DML statement (INSERT INTO ... SELECT FROM stream) executes. A pure SELECT preview does not consume the offset, so you can safely view change content multiple times without affecting production consumption.
DWD Layer: Cross-Department Event Standardization
The DWD layer unifies raw events from different departments into city-themed events, resolving inconsistent field naming and varying status code meanings across source tables.
Create Table
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_smart_city.doc_dwd_city_events
AS
SELECT
complaint_id AS event_id,
'COMPLAINT' AS event_category,
complaint_type AS event_type,
agency AS dept_code,
agency_name AS dept_name,
borough,
incident_zip AS zip_code,
latitude,
longitude,
created_date AS event_time,
CASE
WHEN status = 'Closed' THEN 'RESOLVED'
WHEN status = 'Open' THEN 'OPEN'
ELSE 'IN_PROGRESS'
END AS event_status,
DATEDIFF(CAST(closed_date AS DATE), CAST(created_date AS DATE)) AS resolution_days,
CONCAT(borough, '-', incident_zip) AS geo_key
FROM best_practice_smart_city.doc_ods_311_complaints
WHERE created_date IS NOT NULL
UNION ALL
SELECT
accident_id AS event_id,
'TRAFFIC_ACCIDENT' AS event_category,
contributing_factor_1 AS event_type,
'DOT' AS dept_code,
'Department of Transportation' AS dept_name,
borough,
zip_code,
latitude,
longitude,
CAST(crash_date AS TIMESTAMP) AS event_time,
'RESOLVED' AS event_status,
0 AS resolution_days,
CONCAT(borough, '-', zip_code) AS geo_key
FROM best_practice_smart_city.doc_ods_traffic_accidents
WHERE crash_date IS NOT NULL;
Manually trigger the first refresh and verify:
REFRESH DYNAMIC TABLE best_practice_smart_city.doc_dwd_city_events;
SELECT event_category, COUNT(*) AS cnt
FROM best_practice_smart_city.doc_dwd_city_events
GROUP BY event_category
ORDER BY cnt DESC;
The DWD layer merges 311 complaints (20 records) and traffic accidents (15 records) into standard fields like event_id, event_category, and geo_key. Downstream DWS/ADS layers only need to query this single Dynamic Table without worrying about the raw field differences across departments.
DWS Layer: Street/Borough Daily Summary
The DWS layer aggregates at the granularity of borough ร date ร event category, for borough-level managers to view daily event volumes and resolution efficiency.
Create Table
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_smart_city.doc_dws_borough_daily
AS
SELECT
borough,
DATE_TRUNC('day', event_time) AS stat_date,
event_category,
COUNT(*) AS event_count,
SUM(CASE WHEN event_status = 'RESOLVED' THEN 1 ELSE 0 END) AS resolved_count,
SUM(CASE WHEN event_status = 'OPEN' THEN 1 ELSE 0 END) AS open_count,
ROUND(AVG(CASE WHEN resolution_days IS NOT NULL
AND resolution_days >= 0 THEN resolution_days END), 2) AS avg_resolution_days
FROM best_practice_smart_city.doc_dwd_city_events
WHERE event_time IS NOT NULL
GROUP BY borough, DATE_TRUNC('day', event_time), event_category;
REFRESH DYNAMIC TABLE best_practice_smart_city.doc_dws_borough_daily;
SELECT borough, stat_date, event_category, event_count, resolved_count, avg_resolution_days
FROM best_practice_smart_city.doc_dws_borough_daily
ORDER BY stat_date, borough, event_category
LIMIT 10;
A null avg_resolution_days means the complaint for that day has not yet been closed (closed_date is empty). The CASE WHEN resolution_days >= 0 filters out negative numbers (data anomalies) and null values, preventing the average from being incorrectly pulled down.
Borough complaint resolution summary (aggregated across days):
SELECT
borough,
event_category,
SUM(event_count) AS total_events,
ROUND(100.0 * SUM(resolved_count)
/ NULLIF(SUM(event_count), 0), 1) AS resolution_rate_pct,
ROUND(AVG(avg_resolution_days), 2) AS avg_days_to_resolve
FROM best_practice_smart_city.doc_dws_borough_daily
GROUP BY borough, event_category
ORDER BY total_events DESC;
Result interpretation: BRONX has the highest complaint closure rate (100%), but its average resolution time of 2 days is higher than MANHATTAN's 1.5 days โ meaning BRONX resolves all complaints but is slower to respond. BROOKLYN and STATEN ISLAND have only 50% closure rates with backlogged tickets requiring focused attention. Traffic accident events have 100% closure rates across all boroughs since they are recorded as completed at the time of entry.
ADS Layer: City Operations Index and KPI Dashboard
The ADS layer aggregates across multiple event types to compute a comprehensive city operations score, directly consumed by BI tools and management dashboards.
Create Table
City operations scoring formula: 100 - (backlog ticket percentage ร penalty coefficient + average resolution time ร timeliness coefficient), range 0โ100, higher means better operational status.
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_smart_city.doc_ads_city_ops_index
AS
SELECT
borough,
DATE_TRUNC('day', event_time) AS stat_date,
COUNT(*) AS total_events,
SUM(CASE WHEN event_category = 'COMPLAINT' THEN 1 ELSE 0 END) AS complaint_count,
SUM(CASE WHEN event_category = 'TRAFFIC_ACCIDENT' THEN 1 ELSE 0 END) AS accident_count,
SUM(CASE WHEN event_status = 'RESOLVED' THEN 1 ELSE 0 END) AS resolved_count,
ROUND(
100.0 * SUM(CASE WHEN event_status = 'RESOLVED' THEN 1 ELSE 0 END)
/ NULLIF(COUNT(*), 0), 1
) AS resolution_rate_pct,
ROUND(
AVG(CASE WHEN resolution_days IS NOT NULL
AND resolution_days >= 0 THEN resolution_days END), 2
) AS avg_resolution_days,
-- City operations score: penalties for both backlogged events and resolution time
ROUND(
100.0 - LEAST(100.0,
SUM(CASE WHEN event_status = 'OPEN' THEN 1 ELSE 0 END) * 10.0
+ COALESCE(AVG(CASE WHEN resolution_days IS NOT NULL
AND resolution_days >= 0 THEN resolution_days END), 0) * 5.0
), 1
) AS city_ops_score
FROM best_practice_smart_city.doc_dwd_city_events
WHERE event_time IS NOT NULL
GROUP BY borough, DATE_TRUNC('day', event_time);
REFRESH DYNAMIC TABLE best_practice_smart_city.doc_ads_city_ops_index;
SELECT borough,
ROUND(AVG(city_ops_score), 1) AS avg_ops_score,
SUM(total_events) AS total_events,
SUM(complaint_count) AS complaints,
SUM(accident_count) AS accidents,
ROUND(AVG(resolution_rate_pct), 1) AS resolution_rate_pct
FROM best_practice_smart_city.doc_ads_city_ops_index
GROUP BY borough
ORDER BY avg_ops_score DESC;
QUEENS (95 points) and BRONX (93.8 points) have the best city operations status; BRONX's 100% complaint closure rate is the main driver of its high score.
MANHATTAN (78.8 points) has the highest event volume (10 events), but has 2 backlogged complaints (unclosed), pulling down its operations score. MANHATTAN's absolute event volume is the highest, placing the greatest pressure on overall city operations management.
STATEN ISLAND (73.8 points) has the lowest event volume (4 events), but a 50% backlog rate still requires focused follow-up.
Complaint ticket department resolution efficiency comparison (cross-department analysis under different dimensions):
SELECT
agency_name,
COUNT(*) AS total_complaints,
SUM(CASE WHEN status = 'Closed' THEN 1 ELSE 0 END) AS resolved,
SUM(CASE WHEN status IN ('Open', 'In Progress') THEN 1 ELSE 0 END) AS pending,
ROUND(100.0 * SUM(CASE WHEN status = 'Closed' THEN 1 ELSE 0 END)
/ COUNT(*), 1) AS resolution_rate_pct
FROM best_practice_smart_city.doc_ods_311_complaints
GROUP BY agency_name
ORDER BY total_complaints DESC;
agency_name | total_complaints | resolved | pending | resolution_rate_pct
--------------------------------------------------+-----------------+----------+---------+--------------------
New York City Police Department | 4 | 3 | 1 | 75.0
Department of Transportation | 4 | 3 | 1 | 75.0
Department of Sanitation New York | 4 | 4 | 0 | 100.0
Department of Housing Preservation and Development| 4 | 4 | 0 | 100.0
Department of Environmental Protection | 4 | 2 | 2 | 50.0
The Department of Environmental Protection (DEP) has only a 50% closure rate, with 2 of its 4 complaints backlogged (water quality and noise complaints tend to have longer processing cycles) โ it is currently the department most in need of attention.
Studio Periodic Refresh Task Configuration
Dynamic Table scheduled refresh is managed by creating tasks in Lakehouse Studio, allowing monitoring alerts and data quality rules to be configured on the same task.
Create refresh tasks for the three Dynamic Tables by dependency layer:
In Studio Development โ Tasks, navigate to the path best_practices/smart_city/
Create a "Refresh Dynamic Table" task for each Dynamic Table:
Task Name
Target Table
Schedule
Dependency
refresh_dwd_city_events
doc_dwd_city_events
Daily at 02:00
None (reads ODS directly)
refresh_dws_borough_daily
doc_dws_borough_daily
Daily at 03:00
After refresh_dwd_city_events completes
refresh_ads_city_ops_index
doc_ads_city_ops_index
Daily at 04:00
After refresh_dws_borough_daily completes
Attach data quality rules to the refresh_ads_city_ops_index task: result row count > 0, avg_ops_score between 0 and 100
On the same task page, you can also configure scheduling for the Table Stream consumption task (INSERT INTO doc_dwd_complaint_audit_log SELECT FROM stream), recommended to run hourly to keep the audit log current.
โ ๏ธ Note: The CREATE DDL for Dynamic Tables does not include a REFRESH INTERVAL parameter. Scheduling is entirely managed by Studio Tasks, allowing centralized configuration of monitoring alerts, data quality checks, and task dependencies in one place โ without modifying the DDL.
RBAC: Multi-Department Data Isolation
Government data platforms typically require strict data isolation: department analysts can only view their own department's ODS data, city operations management can view DWS/ADS aggregated data, and platform administrators have full-layer access.
Create Roles
-- Read-only access to ADS layer public indexes (for all departments to view overall city operations)
CREATE ROLE IF NOT EXISTS smart_city_viewer;
-- Dedicated analyst for a specific department (NYPD in this example)
CREATE ROLE IF NOT EXISTS dept_nypd_analyst;
-- City operations platform administrator
CREATE ROLE IF NOT EXISTS city_ops_admin;
Grant Permissions by Layer
-- smart_city_viewer: read-only ADS and DWS aggregation layers (all tables)
GRANT SELECT ON ALL TABLES IN SCHEMA best_practice_smart_city TO ROLE smart_city_viewer;
-- city_ops_admin: full-layer access
GRANT SELECT ON ALL TABLES IN SCHEMA best_practice_smart_city TO ROLE city_ops_admin;
OBJECT_HIERARCHY type authorization automatically covers all existing and future tables created under the schema, eliminating the need to re-authorize after each new table is created.
Revoke Permissions
-- Execute when employees leave or permissions change
REVOKE SELECT ON ALL TABLES IN SCHEMA best_practice_smart_city FROM ROLE dept_nypd_analyst;
๐ก Tip: Row-level permissions (department analysts see only their own department's ODS data) are configured through the Studio Data Security โ Row-Level Permissions UI, with no corresponding SQL DDL. Rules support filtering by the agency field; after configuration, they automatically apply to all queries for that role.
Column Masking: Dynamic PII Field Masking
311 complaint tickets and health inspection data contain geographic coordinates (latitude and longitude); in some scenarios they also include complainant names, contact information, and other PII (personally identifiable information) fields. Column Masking can bind masking functions at the column level to automatically return masked values to non-privileged users.
โ ๏ธ Note: Column Masking is officially available. Use the following process to bind masking policies:
Step 1: Create the masking function (using coordinate precision reduction as an example):
CREATE OR REPLACE FUNCTION best_practice_smart_city.mask_geo_coord(coord DOUBLE)
RETURNS DOUBLE
AS CASE
WHEN CURRENT_USER() IN ('privileged_user') THEN coord -- Replace with actual authorized username
ELSE ROUND(coord, 1) -- Non-privileged users get reduced precision to 1 decimal place (approximately 11 km range)
END;
๐ก Tip: Replace 'privileged_user' with the actual username that needs to view plaintext data. Column Masking matches the current connection's username via the current_user() function; all authorized usernames must be explicitly listed in the IN() list.
Step 2: Bind the masking function to sensitive columns:
ALTER TABLE best_practice_smart_city.doc_ods_311_complaints
CHANGE COLUMN latitude
SET MASK best_practice_smart_city.mask_geo_coord;
ALTER TABLE best_practice_smart_city.doc_ods_311_complaints
CHANGE COLUMN longitude
SET MASK best_practice_smart_city.mask_geo_coord;
After binding, regular analysts can only see coordinates at reduced precision (e.g., 40.7 instead of 40.7484), while the admin account can see the original precision. The masking effect also applies transparently to Dynamic Tables (DWD/DWS/ADS layers) โ downstream tables store already-masked values.
๐ก Tip: The DDL syntax above (ALTER TABLE ... CHANGE COLUMN ... SET MASK) can be executed directly; the masking effect takes effect immediately for all queries.
Data Warehouse Object Overview
After build completion, the objects under the best_practice_smart_city schema:
Department CSV files (OSS / COS / S3)
โ
โผ COPY INTO + Volume (batch import)
doc_ods_311_complaints doc_ods_traffic_accidents doc_ods_health_inspections
(citizen hotline complaints) (traffic accidents) (health inspections)
โ โ โ
โโโ Table Stream โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ doc_stream_complaint_changes โ
โ (STANDARD mode: UPDATE_BEFORE / UPDATE_AFTER) โ
โ โ INSERT INTO doc_dwd_complaint_audit_log (Studio hourly) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ REFRESH daily at 02:00 (Studio Task)
doc_dwd_city_events (Dynamic Table)
UNION ALL standardization ยท event_category ยท geo_key
โ
โผ REFRESH daily at 03:00 (Studio Task)
doc_dws_borough_daily (Dynamic Table)
borough ร day ร category ยท avg_resolution_days
โ
โผ REFRESH daily at 04:00 (Studio Task)
doc_ads_city_ops_index (Dynamic Table)
city_ops_score ยท resolution_rate_pct
External Schema (Hive data lake)
โโโ Direct query without migration; can federated JOIN with ODS tables
Three roles (smart_city_viewer, dept_nypd_analyst, city_ops_admin) are also created for multi-department data isolation access control.
Notes
COPY INTO idempotency: COPY INTO automatically records metadata of already-loaded files; the same file will not be re-imported. However, if file content changes while the filename remains unchanged (overwrite write), the system will not reload it. Use this in conjunction with file naming conventions in the Volume directory (e.g., date-based subdirectories).
Dynamic Tables should not write REFRESH INTERVAL: All Dynamic Table DDLs in this document do not contain REFRESH INTERVAL parameters; scheduling is entirely managed by Studio Tasks. This allows centralized configuration of monitoring alerts, data quality checks, and inter-task dependency orchestration (e.g., DWD refresh triggers DWS refresh) in one place.
Table Stream offset advancement mechanism: Stream offsets advance only after DML statements (INSERT INTO ... SELECT FROM stream) execute. Pure SELECT queries do not consume the offset and will not lose changes. Once advanced, offsets cannot be rolled back; ensure the idempotency of DML statements.
External Schema is read-only: After mounting the Hive data lake, only read operations are supported; write operations still need to be performed in Singdata Lakehouse's own schema. External Schema does not support Dynamic Tables as target tables; it can only be used as an upstream data source.
RBAC row-level permissions have no DDL: Row-level filtering rules for department analysts to see only their own department's data are configured through the Studio Data Security โ Row-Level Permissions UI, with no corresponding SQL DDL. Do not write statements like CREATE ROW ACCESS POLICY in code (this syntax does not exist).
Column Masking: Masking effects apply transparently to all downstream queries (including Dynamic Table SELECTs); DWD/DWS/ADS layers also store masked values. If high-precision coordinates are needed for spatial analysis, query the ODS raw tables directly using a privileged admin account.
Related Documentation
COPY INTO Data Import โ COPY INTO complete parameter description and file format support
โ ๏ธ Requires manual verification: Column Masking currently matches authorization by username via current_user(). All usernames authorized to view plaintext must be added one by one to the whitelist in the masking function. If your Lakehouse version supports role-based dynamic checking (e.g., HAS_ROLE('role_name')), roles can replace the username list for more flexible maintenance. Contact Singdata technical support to confirm whether the current version supports this function.