The NOT NULL constraint ensures that a column always contains a value, preventing any row in the table from having a NULL value in that column. This is crucial for areas where the presence of data is mandatory. For example, if every user must have an email address, applying a NOT NULL constraint to the email column ensures that no rows can be inserted without an email.
CREATE TABLE TenX.users (
user_id INT PRIMARY KEY,
email VARCHAR(255) NOT NULL
);
The NOT IN operator filters results by excluding values present in a subquery or a list. However, the presence of NULL values can lead to unexpected results.
Consider the query:
SELECT * FROM TenX.employees
WHERE department_id NOT IN
(SELECT department_id FROM TenX.departments WHERE active = ‘N’);
If the Department_id column in the Departments table contains NULL values, the query cannot return any results. This is because any comparison with NULL results in UNKNOWN, causing the NOT IN condition to fail.
Example:
Subquery returns: (1, 2, NULL).
Outer query: WHERE department_id NOT IN (1, 2, NULL).
Condition evaluates to UNKNOWN, returning no rows.
To avoid such issues, enforce the NOT NULL constraint on columns involved in NOT IN conditions.
ALTER TABLE TenX.departments
MODIFY department_id INT NOT NULL;
Create Tables:
CREATE TABLE TenX.employees (
employee_id INT,
department_id INT
);
CREATE TABLE TenX.departments (
department_id INT,
active CHAR(1)
);
Insert Sample Data:
INSERT INTO TenX.employees (employee_id, department_id)
VALUES (1, 10), (2, 20), (3, 30);
INSERT INTO TenX.departments (department_id, active)
VALUES (10, ‘Y’), (20, ‘N’), (NULL, ‘N’);
Problematic Query:
SELECT * FROM TenX.employees
WHERE department_id NOT IN (SELECT department_id FROM TenX.departments WHERE active = ‘N’);
This query returns no rows due to the NULL value.
Fix the Issue:
Now let us apply the NOT NULL constraint:
ALTER TABLE TenX.departments
MODIFY department_id INT NOT NULL;
And then, remove or update rows with NULL values:
DELETE FROM TenX.departments
WHERE department_id IS NULL;
Re-run the Query:
SELECT * FROM TenX.employees
WHERE department_id NOT IN (SELECT department_id FROM TenX.departments WHERE active = ‘N’);
This query now correctly returns employees not in inactive departments.
To quantify the performance impact of NOT NULL constraints, we compare CPU seconds and I/Os for queries with and without NULL values in subqueries.
Setup
Without NOT NULL Constraint:
CREATE TABLE TenX.departments_nullable (
department_id INT,
active CHAR(1)
);
INSERT INTO TenX.departments_nullable (department_id, active)
VALUES (10, ‘Y’), (20, ‘N’), (NULL, ‘N’);
SELECT * FROM TenX.employees
WHERE department_id NOT IN (SELECT department_id FROM departments_nullable WHERE active = ‘N’);
With NOT NULL Constraint:
CREATE TABLE TenX.departments_notnull (
department_id INT NOT NULL,
active CHAR(1)
);
INSERT INTO TenX.departments_notnull (department_id, active)
VALUES (10, ‘Y’), (20, ‘N’);
SELECT * FROM TenX.employees
WHERE department_id NOT IN (SELECT department_id FROM departments_notnull WHERE active = ‘N’);
Without NOT NULL Constraint:
With NOT NULL Constraint:
Results:
The NOT NULL constraint significantly improves query performance by reducing both CPU time and I/O operations. It is crucial for ensuring data integrity and optimizing SQL queries, particularly when used with the NOT IN operator. This prevents issues related to NULL value comparisons and enhances performance.
Understanding and properly applying NOT NULL constraints appropriately, results in more efficient and reliable database operations, ensuring accurate and faster data retrieval.
Mubasher, at TenX works as Principal Performance Analyst