Pipe Continuous Ingestion
Pipe is Singdata Lakehouse's continuous data ingestion pipeline object, used to continuously import data from Kafka or object storage (OSS/S3/COS) into Lakehouse tables.
You can think of Pipe as an automated conveyor belt: after data files are uploaded to OSS/S3/COS or messages are written to Kafka, Pipe automatically detects and ingests them into the warehouse without manual triggering or configuring scheduled tasks.
Selection Guide: If you are accustomed to managing data pipelines with SQL, choose Pipe. If you need to connect to relational databases (MySQL/PostgreSQL, etc.), or prefer visual configuration, choose Studio sync tasks.
What Is a Pipe
A Pipe is a SQL object created via DDL statements. Once created, the Pipe runs continuously, automatically reading data from the data source and writing it to the target table.
Differences from Studio Sync Tasks:
| Dimension | Pipe | Studio Sync Task |
|---|---|---|
| Creation Method | SQL DDL | Studio visual interface |
| Management Method | SQL commands | Studio interface + cz-cli |
| Applicable Scenarios | Familiar with SQL, need code-based pipeline management | Prefer visual configuration, or need to connect to relational databases |
| Data Sources | Kafka, object storage | Relational databases, Kafka, object storage |
The two are functionally equivalent; choose based on your usage habits.
Pipe Types
Kafka Pipe
Continuously consumes data from a Kafka topic and writes it to a Lakehouse table.
Two Ingestion Paths:
- READ_KAFKA Pipe (Recommended): Uses the
READ_KAFKA()function directly in the Pipe - Kafka External Table + Table Stream: First create a Kafka external table, then consume via Table Stream
Object Storage Pipe
Continuously scans new files from OSS/S3/COS and imports them.
Comparison of Two Scan Modes:
| Dimension | LIST_PURGE | EVENT_NOTIFICATION |
|---|---|---|
| Trigger Method | Periodic polling scan of directory | Object storage event notification (near real-time trigger) |
| Supported Clouds | OSS, S3, COS | OSS, S3 only |
| Authorization Method | Secret key or Role ARN | Role ARN only |
| Source File Processing | Auto-deletes source files after successful import (requires PURGE = TRUE) | Preserves source files |
| Configuration Complexity | Simple, no extra configuration needed | Requires MNS queue configuration |
Pipe Lifecycle
Monitoring Pipes
Applicable Scenarios
- Kafka Real-time Ingestion: Log data, business events written in real time
- Object Storage Batch Import: Regularly uploaded CSV/JSON/Parquet files automatically ingested
- Replace Scheduled Tasks: No need to configure Cron; Pipe runs continuously
