15 minute read

What is SQL?

SQL stands for Structured Query Language. It is used to access and manipulate databases. To use SQL, we need a Relational Database Management System (RDBMS) like MySQL, PostgreSQL, etc.

Components of SQL

SQL has some key components including:

  • Queries: Retrieves data from a database. Generally, use the SELECT statement.
  • Data Definition Language (DDL): Create, update, or delete database and related objects like tables, views, etc. Includes commands like CREATE, ALTER, etc.
  • Data Manipulation Language (DML): Manage data within database objects. These commands include SELECT, INSERT, UPDATE, and DELETE.
  • Data Control Language (DCL): Deals with control-level management tasks like GRANT, REVOKE, etc.

Some key points

  • SQL keywords like SELECT are not case-sensitive. So, any of SELECT, select, or SeLeCT would do the work. But it is a convention to write the SQL keywords in uppercase letters.
  • Using a semicolon (;) at the end of an SQL statement is standard though many DBMS work without it.

SQL with database

Operations to modify a database using SQL:

Operation Syntax Example
Create CREATE DATABASE ; CREATE DATABASE myDb;
Drop/delete DROP DATABASE ;  

Backing up a database

-- syntax
BACKUP DATABASE <name> TO DISK=<BACKUP_PATH> WITH DIFFERENTIAL;

-- examples
-- BACKUP DATABASE myDb TO DISK='myDb.bak'; => backs up the whole DB
-- BACKUP DATABASE myDb TO DISK='myDb.bak' WITH DIFFERENTIAL; => backs up the 
--              changed parts of the DB and merges with already backed up one.

SQL with tables

Creating a table

-- syntax
CREATE TABLE <name> (
	<col1> <dtype1> <constrain1>,
	<col2> <dtype2> <constrain2>,
	...
);
-- dtype = data type

-- example
-- CREATE TABLE users (
--	  id int,
--	  role int,
--   name varchar(128),
--   city varchar(128),
--   age int,
--   income int
-- );

For details about data types, check data types in PostgreSQL and MySQL, two commonly used database servers. Some commonly used constraints are listed below:

Constrain Description
NOT NULL A column cannot have a NULL value.
UNIQUE Ensures all values are unique in a column.
PRIMARY KEY All values are unique and not NULL. A table can have only one primary key (PK). PK can contain one or multiple columns.
FOREIGN KEY Refers to the PK of another table. Prevent actions that destroy relations between tables.
CHECK Check a custom value.
DEFAULT Sets a default value if no other value is set.

Creating a table from an existing table: A new table can be created from an existing table using the same CREATE TABLE command. All columns or selected columns from the old table can be used during this operation. The new table will be filled with the data from the existing table. It has the following syntax:

-- syntax
CREATE TABLE <name> AS
	SELECT <col1>, <col2>, ...
	FROM <old_table> WHERE ...;

SELECT and WHERE clauses are discussed in the later sections.

Deleting a table

We can delete a table from two perspectives.

  1. We want to delete the whole table along with the data.
  2. We want to delete that data only while keeping the table.
-- syntax
-- 1. delete whole table
DROP TABLE <name>;

-- 2. delete data only
TRUNCATE TABLE <name>;

The truncation of a table is fast. It may bypass the integrity-enforcing mechanisms like triggers. A table can not be truncated which is:

  • Referenced by a FOREIGN KEY constraint to some other table.
  • Participates in an indexed view.

Altering/updating a table

The ALTER TABLE command can be used to update a table.

-- adding a new column
ALTER TABLE <name> ADD <col> <dtype> <constrain>;

-- deleting a column
ALTER TABLE <name> DROP COLUMN <col>;

-- deleting multiple columns
ALTER TABLE <name> DROP (<col1>, <col2>, ...);

-- renaming a column
ALTER TABLE <name> RENAME COLUMN <old_name> TO <new_name>;

-- changing the data type of a column
ALTER TABLE <name> MODIFY COLUMN <col> <dtype> <constrain>;  # MySQL server
ALTER TABLE <name> ALTER COLUMN <col> <dtype> <constrain>;  # SQL server

-- adding constraints
ALTER TABLE <name> ADD <constraint> (<col>);

-- removing constraints
ALTER TABLE <name> DROP INDEX <col>;  # MySQL server
ALTER TABLE <name> DROP CONSTRAINT <col>;  # SQL server

Renaming a table

-- syntax
RENAME TABLE <old_name> TO <new_name>;

SQL with data

Data insertion

-- syntax 1: insert values in the specified columns only
INSERT INTO <table> (<col1>, ..., <coln>) VALUES (<val1>, ... <valn>);

-- syntax 2: insert values in all columns (can use syntax 1 also)
INSERT INTO <table> VALUES (<val1>, ... <valn>);

-- syntax 3: insert values in specific columns using SET 
INSERT INTO <table>
SET
	<col1> = <val1>,
	<col2> = <val2>,
	...,
	<coln> = <valn>

-- syntax 4: inserting multiple rows
INSERT INTO <table> (<col1>, ..., <coln>)
VALUES 
	(<val11>, ... <val1n>),
	(<val21>, ... <val2n>),
	...
	(<valm1>, ... <valmn>);

-- syntax 5: insert data from another table
INSERT INTO <table1> (<col1>, ..., <coln>)
SELECT (<col1>, ..., <coln>)
FROM <table2>
WHERE <condition>;

Data extraction/selection

-- syntax
SELECT <col1>, <col2>, ... FROM <table>;

-- examples
-- SELECT * FROM myTable; => selects all data from myTable
-- SELECT name, age FROM myTable; => selects only name and age

-- syntax: select from multiple tables
SELECT <table_name>.<col_name> FROM <tables> WHERE ...;

-- examples
-- SELECT users.name, users.email, purchases.total FROM users, purchases WHERE users.id = purchases.user_id;

-- syntax: selection with aliases
SELECT <alias>.<col> FROM <table> AS <alias> WHERE ...;

-- example
-- SELECT 
--	  u.name, u.email, p.total 
-- FROM 
--   users AS u, purchases AS p 
-- WHERE 
--   users.id = purchases.user_id;

The selection query can be nested also. Like we can at first select some rows from a database and run a query on the selected rows. Here is an example:

-- nested query example. (Just a simple one though this nested query is not needed really)
SELECT
	city
FROM
	(
		SELECT
			country, city
		FROM
			users
	)

Selecting top N rows: The SELECT TOP clause can be used to select and return top N rows.

-- Syntax: SQL Server
SELECT TOP <N> <cols> FROM <table> WHERE <condition>;

-- Syntax: MySQL
SELECT <cols> FROM <table> WHERE <condition> LIMIT <N>;

Selecting the top N percent:

-- Syntax: SQL Server
SELECT TOP <N> PERCENT <cols> FROM <table> WHERE <condition>;

Ordering

ORDER BY clause is used to sort the result in ascending or descending order. By default, it sorts in results in ascending order. For numeric values, the data will be sorted based on values and for string values data will be sorted alphabetically.

-- syntax
SELECT <cols> FROM <table> ORDER BY <col1>, <col2>, ... <direction>;
-- direction is optional and is either ASC or DESC

-- examples
-- SELECT * FROM users ORDER BY age; => sorts users by age in ascending order
-- SELECT * FROM users ORDER BY age ASC; => sorts users by age in ascending order
-- SELECT * FROM users ORDER BY age DESC; => sorts users by age in descending order

Ordering by several columns: When ordering by several columns, the result is sorted according to the first column first. Then if some rows have the same value for the first column, it shorts that group of rows based on the second column, and do on…

Ordering different columns in different directions: It is also possible to order different columns in different directions. Such as the statement SELECT * FROM users ORDER BY age ASC, income DESC; will sort the results by age in ascending order and among the same age, by income in descending order.

Conditional selection/filtering

The WHERE clause can be used to conditionally select data points. We can also use ORDER BY with WHERE to sort the results.

-- syntax
SELECT <col1>, <col2>, ... FROM <table> WHERE <condition>;

-- example
-- SELECT email FROM users WHERE role_id=1;

Several logical operators can be used to represent the condition in the WHERE clause.

Operators

SQL generally supports the following logical operators:

Operator Description Example
+ Addition SELECT (age + 5) FROM users;
- Subtraction  
* Multiplication  
/ Division  
% Modulus  
= Checks equality  
> Greater than  
< Less than  
>= Greater than or equal  
<= Less than or equal  
<>, != Not equal  
BETWEEN Between a certain range SELECT * FROM users WHERE age BETWEEN 18 AND 30;
LIKE* Search patterns in text SELECT * FROM users WHERE name LIKE ‘a%’;
IN Match multiple possible value SELECT * FROM users WHERE city IN (‘Dhaka’, ‘Rajshahi’);
NOT** Negate the result of the condition SELECT * FROM users WHERE city NOT IN (‘Dhaka’, ‘Rajshahi’);
AND Use more than one condition. True when all conditions are true. SELECT * FROM users WHERE city=’Dhaka’ AND age>30
OR Use more than one condition. True when at least one condition is true. SELECT * FROM users WHERE age <= 18 OR age>=65
IS NULL Select the null-valued rows. SELECT * FROM users WHERE city is NULL
IS NOT NULL Select the not null/empty valued rows.  
  • Used wildcards with the LIKE operator:

  • The percent sign (%): It represents zero, one, or multiple characters. Behaves like * in regex.
  • The underscore sign (_): It represents one single character. Behaves like . in regex.

** Some Examples with NOT:

SELECT * FROM users WHERE name NOT LIKE '%A';
SELECT * FROM users WHERE age NOT BETWEEN 18 AND 65;
SELECT * FROM users WHERE NOT age<18;
SELECT * FROM users WHERE NOT city='Dhaka';

-- for symbolic operators (like =, <, >, etc.) NOT is before the condition
-- for other operators (like LIKE, IN, etc.) NOT is before the operator

Selecting unique points

We can select unique data points using the DISTINCT keyword. It returns all the distinct rows against a query.

-- syntax
SELECT DISTINCT <col1>, <col2>, ... FROM <table> WHERE ...;

-- examples
-- SELECT DISTINCT email FROM users; => returns all unique email address
-- SELECT DISTINCT email, phone FROM users; => returns all unique rows if email and phone.

Updating data

-- syntax
UPDATE <table>
SET
	<col1>=<val1>,
	<col2>=<val2>,
	...,
	<coln>=<valn>
WHERE
	<condition>;

All records that match the WHERE condition, will be updated. If no condition is given, all values in that column will be updated to the given value.

Deleting data

-- syntax
DELETE FROM <table> WHERE <condition>;

All records that match the WHERE condition, will be deleted. If no condition is given, all values in that column will be deleted.

Aggregate Queries

Counting data points

The COUNT function can be used to count the number of rows in a column. It takes only a single argument. So, if we want to count the number of distinct rows in a query, we have to use a nested query. It has the following syntax:

-- syntax
SELECT COUNT(<col>) FROM <table>;

-- example
-- SELECT COUNT(*) FROM myTable; => count the number of rows (including null) in the table
-- SELECT COUNT(*) FROM (SELECT id FROM users WHERE role=1); => count number of users with role 1
-- SELECT COUNT(DISTINCT city) FROM users;

COUNT(*): Counts all rows including the null rows. COUNT(<col>): Counts rows in a <col> excluding the null rows.

Min/Max data point

While using MIN and MAX, null values are not considered for comparison.

-- Syntax: min
SELECT MIN(<col>) FROM <table> WHERE ...;

-- Syntax: max
SELECT MAX(<col>) FROM <table> WHERE ...;

The sum of a numeric column

-- Syntax
SELECT SUM(<col>) FROM <table> WHERE ...;

-- Examples
-- SELECT SUM(age * 2) FROM users; can be used with expression also
-- SELECT SUM(age * 2) AS DoubleAge FROM users;

The average of a numeric column

-- Syntax
SELECT AVG(<col>) FROM <table> WHERE ...;

Grouping data points

GROUP BY is used to arrange identical data into groups by one or multiple columns. It is generally used with the aggregate functions.

-- syntax
SELECT <cols> FROM <table> WHERE <condition> GROUP BY <col1>, <col2>, ...;

-- example
-- SELECT city, COUNT(id) FROM users WHERE age > 18 GROUP BY city;

The HAVING clause is used to filter the results produced by a GROUP BY statement. It’s similar to a WHERE clause but operates on the results of a grouping.

-- syntax
SELECT <cols> 
FROM <tables> 
WHERE <condition> 
GROUP BY <col1>, <col2>, ...
HAVING <condition2>;

-- example
-- SELECT city, COUNT(id), SUM(income) AS cityIncome
-- FROM users WHERE age > 18
-- GROUP BY city
-- HAVING cityIncome > 10000;

SQL Joins

Join is used to combine rows from two or more tables based on a common or related column. Join can be used with SELECT, UPDATE and DELETE commands. There are several types of joins.

Inner join

The INNER JOIN command joins two tables based on a common column and returns the records from the selected columns with matching values in both tables.

This operation compares each row of the first table with each row of the second table to find all pairs of rows that satisfy the join predicate.

Few things to consider in case of INNER JOIN:

  • It is a default join in SQL. If you mention JOIN in your query without specifying the type, SQL considers it as an INNER JOIN.
  • It returns only the matching rows from both the tables.
  • If there is no match, the returned is an empty result.
-- syntax
SELECT <table_name>.<cols>
FROM <table_1>
INNER JOIN <table_2>
ON <table_1>.<col> = <table_2>.<col>

Inner join kind of creates a larger table taking the columns of both tables where the common column acts as the glue. The SELECT statement filters the desired columns.

Left (outer) join

Returns all records from the left table, and matched records from the right table.

-- syntax
SELECT <table_name>.<cols>
FROM <table_1>
LEFT JOIN <table_2>
ON <table_1>.<col> = <table_2>.<col>

Right (outer) join

Returns all records from the right table and the matched records from the left table.

-- syntax
SELECT <table_name>.<cols>
FROM <table_1>
RIGHT JOIN <table_2>
ON <table_1>.<col> = <table_2>.<col>

Full (outer) Join

Returns all records from the left and right table where a match is found in either the left or right table.

-- syntax
SELECT <table_name>.<cols>
FROM <table_1>
FULL JOIN <table_2>
ON <table_1>.<col> = <table_2>.<col>

Self Join

In self join, a table is joined to itself. It’s quite useful in scenarios where comparison operations need to be made within a table. Essentially, it is used to combine rows with other rows in the same table when there’s a match based on the condition provided.

-- syntax
SELECT a.<col>, b.<col>
FROM <table1> AS a, <table1> AS b
WHERE <condition>;

-- example
-- SELECT a.Name AS Employee, b.Name AS Manager
-- FROM EMPLOYEES a, EMPLOYEES b
-- WHERE a.ManagerID = b.EmployeeID;

Cross Join

The cross join in SQL is used to combine every row of the first table with every row of the second table. It’s also known as the Cartesian product of the two tables. It does not require any condition to cross join.

-- syntax 1
SELECT <cols>
FROM <table1>
CROSS JOIN <table2>

-- syntax 2
SELECT <cols>
FROM <table1>, <table2>

Subqueries (Nested Queries)

Subquery is a query within another query, also known as nested query or inner query. The nested queries are utilized to extract data that will be used in the main/outer query. Subqueries can be either correlated or uncorrelated.

Correlated subquery: Subquery that uses values from the outer query. Uncorrelated subquery: Subquery that can be run independently of the outer query.

Types of subqueries

Scalar subquery

Returns exactly one column with a single value. Can be used anywhere in a SQL query where expressions are allowed. Some common scalar types are: INT, FLOAT, CHAR, VARCHAR, DATE, BOOL, etc.