Table

In Singdata Lakehouse, a Table is the basic unit for storing data and is the core structure for organizing data in a database. The tables in Lakehouse are designed for columnar storage, which is highly efficient for processing analytical queries because it allows queries to read only the required column data instead of entire rows. This storage structure is particularly suitable for data warehousing and big data analytics scenarios, significantly improving data processing speed and efficiency.

Table Constraints

Table constraints are used to ensure the integrity and accuracy of the data in the table. Lakehouse supports various types of constraints, including but not limited to:

NOT NULL

The NOT NULL constraint ensures that the values in a column cannot be NULL. This is set when the table is created, and once set, the constraint cannot be removed to ensure that there is always valid data in the column.

PRIMARY KEY

The PRIMARY KEY constraint is used to uniquely identify each record in the table. In Lakehouse, when a PRIMARY KEY is specified, real-time writes will perform real-time deduplication based on the specified key. This is very useful in real-time CDC (Change Data Capture) write scenarios. For example, real-time reading of MySQL binlog logs and writing them into Lakehouse to keep consistent with the original MySQL table.

ANSI SQL Constraint Attributes

Lakehouse supports ANSI SQL standard constraint attributes, making the use of constraints more flexible and powerful.

  • ENFORCED | NOT ENFORCED: This attribute specifies whether the constraint is enforced. ENFORCED means the constraint is enforced, while NOT ENFORCED means it is not. For example, in a development environment, a constraint can be temporarily set to NOT ENFORCED for development and testing purposes.
  • DEFERRABLE | NOT DEFERRABLE: This attribute specifies whether the constraint is deferrable or not. DEFERRABLE means the constraint check can be deferred until the end of the transaction, while NOT DEFERRABLE means it is checked immediately. For example, in a transaction where multiple records need to be inserted, setting the constraint to DEFERRABLE can improve performance.
  • INITIALLY { DEFERRED | IMMEDIATE }: This attribute specifies the timing of the constraint check. If the constraint is NOT DEFERRABLE, it can only be INITIALLY IMMEDIATE, meaning it is checked immediately. For example, in a scenario where immediate validation is required, the constraint can be set to INITIALLY IMMEDIATE.

Extended Constraint States

Lakehouse provides extended constraint states, making the use of constraints more flexible.

  • ENABLE | DISABLE: This attribute specifies whether the constraint is enabled or disabled. Enabled constraints are checked during data operations, while disabled constraints are not. For example, in a testing environment, a constraint can be temporarily disabled for testing purposes.
  • VALIDATE | NOVALIDATE: This attribute specifies whether to validate existing data in the table and create the corresponding index. VALIDATE means validation will be performed, while NOVALIDATE means it will not. For example, in a scenario where a table needs to be created quickly, the constraint can be set to NOVALIDATE to speed up the table creation process.

Lakehouse Constraint Attributes

In Lakehouse, there are some unique constraint attributes, such as:

  • ENABLE VALIDATE RELY: This is a composite attribute that enforces constraint validation and validates existing data in the table. This attribute only supports data write operations using the Java SDK and does not support SQL inserts. For example, in a scenario where data accuracy must be ensured, the ENABLE VALIDATE RELY attribute can be used to enforce constraint validation.

By using the above constraints, the integrity and accuracy of data in Lakehouse can be ensured while improving data processing efficiency. In practice, various constraints should be chosen and used reasonably based on specific scenarios and requirements.