May, 2024

Understanding the significance of NOT NULL in SQL
SHARE THIS

Introduction

In the world of SQL, constraints are crucial for ensuring data integrity and reliable database operations – the most fundamental constraints is NOT NULL. This article explores the meaning of the NOT NULL constraint, especially in the context of the NOT IN operator, and provides performance insights through CPU seconds and I/O comparisons.

The Basics: What is NOT NULL?

The NOT NULL constraint ensures that a column always contains a value, preventing any row in the table from having a NULL value in that column. This is crucial for areas where the presence of data is mandatory. For example, if every user must have an email address, applying a NOT NULL constraint to the email column ensures that no rows can be inserted without an email.

 

CREATE TABLE TenX.users (

    user_id INT PRIMARY KEY,

    email VARCHAR(255) NOT NULL

);

The Basics: What is NOT NULL?

  • Data Integrity: Ensures critical fields are never left empty, maintaining consistent and reliable data.
  • Query Optimization: Allows the database engine to optimize queries better by eliminating the need to account for NULL values in those columns.
  • Simplified Application Logic: Reduces the need for extensive validation logic in the application, making the codebase cleaner.

Interaction of NOT NULL with NOT IN

The NOT IN operator filters results by excluding values present in a subquery or a list. However, the presence of NULL values can lead to unexpected results.

The Problem with NULLs in NOT IN

Consider the query:

 

SELECT * FROM TenX.employees

WHERE department_id NOT IN

(SELECT department_id FROM TenX.departments WHERE active = ‘N’);

 

If the Department_id column in the Departments table contains NULL values, the query cannot return any results. This is because any comparison with NULL results in UNKNOWN, causing the NOT IN condition to fail.

 Example:

Subquery returns: (1, 2, NULL).

Outer query: WHERE department_id NOT IN (1, 2, NULL).

Condition evaluates to UNKNOWN, returning no rows.

Ensuring Accurate Results with NOT NULL

To avoid such issues, enforce the NOT NULL constraint on columns involved in NOT IN conditions.

ALTER TABLE TenX.departments

MODIFY department_id INT NOT NULL;

Practical Example: Applying NOT NULL and NOT IN

Create Tables:

CREATE TABLE TenX.employees (

    employee_id INT,

    department_id INT

);

CREATE TABLE TenX.departments (

    department_id INT,

    active CHAR(1)

);

Insert Sample Data:

INSERT INTO TenX.employees (employee_id, department_id)

VALUES (1, 10), (2, 20), (3, 30);

INSERT INTO TenX.departments (department_id, active)

VALUES (10, ‘Y’), (20, ‘N’), (NULL, ‘N’);

Problematic Query:

SELECT * FROM TenX.employees

WHERE department_id NOT IN (SELECT department_id FROM TenX.departments WHERE active = ‘N’);

This query returns no rows due to the NULL value.

Fix the Issue:

Now let us apply the NOT NULL constraint:

ALTER TABLE TenX.departments

MODIFY department_id INT NOT NULL;

And then, remove or update rows with NULL values:

DELETE FROM TenX.departments

WHERE department_id IS NULL;

Re-run the Query:

SELECT * FROM TenX.employees

WHERE department_id NOT IN (SELECT department_id FROM TenX.departments WHERE active = ‘N’);

This query now correctly returns employees not in inactive departments.

Performance Comparison: CPU Time and I/O

To quantify the performance impact of NOT NULL constraints, we compare CPU seconds and I/Os for queries with and without NULL values in subqueries.

Setup

Without NOT NULL Constraint:

CREATE TABLE TenX.departments_nullable (

    department_id INT,

    active CHAR(1)

);

INSERT INTO TenX.departments_nullable (department_id, active)

VALUES (10, ‘Y’), (20, ‘N’), (NULL, ‘N’);

SELECT * FROM TenX.employees

WHERE department_id NOT IN (SELECT department_id FROM departments_nullable WHERE active = ‘N’);

With NOT NULL Constraint:

 CREATE TABLE TenX.departments_notnull (

    department_id INT NOT NULL,

    active CHAR(1)

);

 INSERT INTO TenX.departments_notnull (department_id, active)

VALUES (10, ‘Y’), (20, ‘N’);

SELECT * FROM TenX.employees

WHERE department_id NOT IN (SELECT department_id FROM departments_notnull WHERE active = ‘N’);

Performance Metrics

Without NOT NULL Constraint:

  • CPU Time: Higher due to additional checks for NULL values.
  • I/O Operations: Elevated as the query engine evaluates UNKNOWN results.

With NOT NULL Constraint:

  • CPU Time: Reduced as the SQL engine skips NULL checks.
  • I/O Operations: Lower due to straightforward comparisons.

Results:

Screenshot 2024-05-24 163653


Conclusion

The NOT NULL constraint significantly improves query performance by reducing both CPU time and I/O operations. It is crucial for ensuring data integrity and optimizing SQL queries, particularly when used with the NOT IN operator. This prevents issues related to NULL value comparisons and enhances performance.

Understanding and properly applying NOT NULL constraints appropriately, results in more efficient and reliable database operations, ensuring accurate and faster data retrieval.

 

 

Picture of Mubasher Hassan

Mubasher Hassan

Mubasher, at TenX works as Principal Performance Analyst