12 KiB
SQL Joins and Aggregation
Agenda
- SQL Joins
- Aggregation Functions (SUM, AVG, COUNT etc)
- Sub-queries
- Built-in Functions
Joins
Join is the widely-used clause in the SQL Server essentially to combine and retrieve data from two or more tables. In a real-world relational database, data is structured in many tables and which is why, there is a constant need to join these multiple tables based on logical relationships between them.
Inner join
Inner Join clause in SQL Server creates a new table (not physical) by combining rows that have matching values in two or more tables. This join is based on a logical relationship (or a common field) between the tables and is used to retrieve data that appears in both tables.
Assume, we have two tables, Table A and Table B, that we would like to join using SQL Inner Join. The result of this join will be a new result set that returns matching rows in both these tables. The intersection part in black below shows the data retrieved using Inner Join.
Keyword: INNER JOIN
or simply JOIN
Syntax: SELECT column1, column2, ... FROM table_name1 JOIN table_name2 ON condition
For example, we want to get the batch names of all the students along with their names.
id | first_name | last_name | batch_name |
---|---|---|---|
1 | John | Watson | Sherlock |
2 | Mycroft | Holmes | Sherlock |
This can be achieved by using the following SQL query:
SELECT s.first_name, s.last_name, b.batch_name FROM students s JOIN batches b ON s.batch_id = b.id;
Left Outer Join
Let us consider the students and batches tables below:
id | first_name | last_name | batch_id |
---|---|---|---|
1 | John | Watson | 1 |
2 | Mycroft | Holmes | 1 |
3 | Moriarty | Patel | NULL |
id | batch_name |
---|---|
1 | Sherlock Academy |
2 | Crime Academy |
Now if we decide to get the batch names of all the students along with their names with an inner join
we will get the following result:
id | first_name | last_name | batch_name |
---|---|---|---|
1 | John | Watson | Sherlock Academy |
2 | Mycroft | Holmes | Sherlock Academy |
Since both John and Mycroft had a valid batch_id, the respective rows were merged to produce the result. But Moriarty did not have a valid batch_id, so the result set will not contain him.
But if we want to run a query to get all students along with their batch names or NULL if they do not have a batch_id, we can use the Left Outer Join.
The LEFT JOIN keyword returns all records from the left table (table1), and the matching records from the right table (table2). The result is 0 records from the right side, if there is no match.
Keyword: LEFT OUTER JOIN
or simply LEFT JOIN
Syntax: SELECT column1, column2, ... FROM table_name1 LEFT JOIN table_name2 ON condition
The query for fetching students with their batch names now will be:
SELECT
s.first_name, s.last_name, b.name
FROM
students s
LEFT JOIN
batches b ON s.batch_id = b.id;
And the result will be:
id | first_name | last_name | batch_name |
---|---|---|---|
1 | John | Watson | Sherlock Academy |
2 | Mycroft | Holmes | Sherlock Academy |
3 | Moriarty | Patel | NULL |
Right Outer Join
Similarly, RIGHT OUTER JOIN
returns all records from the right table (table2), and the matching records from the left table (table1). The result is 0 records from the left side, if there is no match.
Now if we want to get all the batches along with their students if present else NULL
, we can use RIGHT OUTER JOIN
.
Keyword: RIGHT OUTER JOIN
or simply RIGHT JOIN
Syntax: SELECT column1, column2, ... FROM table_name1 RIGHT JOIN table_name2 ON condition
The query for fetching batches with their students now will be:
SELECT
s.first_name, s.last_name, b.name
FROM
students s
RIGHT JOIN
batches b ON s.batch_id = b.id;
This will return the following result:
id | first_name | last_name | batch_name |
---|---|---|---|
1 | John | Watson | Sherlock Academy |
2 | Mycroft | Holmes | Sherlock Academy |
NULL | NULL | NULL | Crime Academy |
Full Outer Join
The FULL OUTER JOIN clause returns a result set that includes rows from both left and right tables.
When no matching rows exist for the row in the left table, the columns of the right table will contain NULL. Likewise, when no matching rows exist for the row in the right table, the column of the left table will contain NULL.
Keyword: FULL OUTER JOIN
or simply FULL JOIN
Syntax: SELECT column1, column2, ... FROM table_name1 FULL JOIN table_name2 ON condition
The query for fetching batches with their students now will be:
SELECT
s.first_name, s.last_name, b.name
FROM
students s
FULL JOIN
batches b ON s.batch_id = b.id;
This will return the following result:
id | first_name | last_name | batch_name |
---|---|---|---|
1 | John | Watson | Sherlock Academy |
2 | Mycroft | Holmes | Sherlock Academy |
3 | Moriarty | Patel | NULL |
NULL | NULL | NULL | Crime Academy |
Warning
MySQL does not support the FULL OUTER JOIN clause.
Cross Join
The result set from a cross join
will include all rows from both tables, where each row is the combination of the row in the first table with the row in the second table. In general, if each table has n
and m
rows respectively, the result set will have n
x m
rows.
In other words, the CROSS JOIN clause returns a Cartesian product of rows from the joined tables.
Keyword: CROSS JOIN
Syntax: SELECT column1, column2, ... FROM table_name1 CROSS JOIN table_name2
Syntax 2: SELECT column1, column2, ... FROM table_name1, table_name2
An example query for fetching from the students and batches tables will be:
SELECT
s.first_name, s.last_name, b.name
FROM
students s
CROSS JOIN
batches b;
Note that different from the INNER JOIN, LEFT JOIN , and RIGHT JOIN clauses, the CROSS JOIN clause does not have a join predicate. In other words, it does not have the ON or USING clause.
If you add a WHERE clause, in case table t1 and t2 has a relationship, the CROSS JOIN works like the INNER JOIN.
Aggregate Functions
MySQL's aggregate function is used to perform calculations on multiple values and return the result in a single value like the average of all values, the sum of all values, and maximum & minimum value among certain groups of values.
We mainly use the aggregate functions in databases, spreadsheets and many other data manipulation software packages. In the context of business, different organization levels need different information such as top levels managers interested in knowing whole figures and not the individual details. These functions produce the summarised data from our database. Thus, they are extensively used in economics and finance to represent the economic health or stock and sector performance.
The ISO SQL standard defines the following aggregate functions:
Function | Description |
---|---|
MIN | Return the minimum value |
MAX | Return the maximum value |
SUM | Return the sum of all values |
AVG | Return the average value of the argument |
COUNT | Return a count of the number of rows returned |
Get the minimum value
Keyword: MIN
Syntax: SELECT MIN(column_name) FROM table_name
Getting the minimum value without using the aggregate function could be done by using the ORDER BY
clause.
SELECT
first_name, iq
FROM
students
ORDER BY iq
LIMIT 1;
This would give the correct answer but not in the case of NULL values. By default, MySQL will return the NULL value at the top of the sorted rows. Special handling will have to be done to get the correct result.
This is why we can use the aggregate function MIN
to get the minimum value.
SELECT
MIN(iq)
FROM
students;
The aggregate functions ignore the NULL values.
Get the maximum value
Keyword: MAX
Syntax: SELECT MAX(column_name) FROM table_name
To get the maximum IQ in students, we can use the aggregate function MAX
.
SELECT
MAX(iq)
FROM
students;
Get the sum of all values
Keyword: SUM
Syntax: SELECT SUM(column_name) FROM table_name
To get the sum of all the IQ in students, we can use the aggregate function SUM
.
SELECT
SUM(iq)
FROM
students;
Get the average value
Keyword: AVG
Syntax: SELECT AVG(column_name) FROM table_name
To get the average IQ in students, we can use the aggregate function AVG
.
SELECT
AVG(iq)
FROM
students;
Get the number of rows
Keyword: COUNT
Syntax: SELECT COUNT(column_name) FROM table_name
To get the number of rows in students, we can use the aggregate function COUNT
.
SELECT
COUNT(*)
FROM
students;
Using *
instead of a column name will return the number of rows in the table whereas using a column name will return the number of rows that have a non-null value in a column.
GROUP BY clause
The GROUP BY clause groups a set of rows into a set of summary rows by values of columns or expressions. The GROUP BY clause returns one row for each group. In other words, it reduces the number of rows in the result set
Keyword: GROUP BY
Syntax: SELECT column1, column2, ... FROM table_name GROUP BY column1, column2, ...
In this syntax, you place the GROUP BY clause after the FROM and WHERE clauses. After the GROUP BY keywords, you place is a list of comma-separated columns or expressions to group rows.
MySQL evaluates the GROUP BY clause after the FROM and WHERE clauses and before the HAVING, SELECT, DISTINCT, ORDER BY and LIMIT clauses
Fields that are not encapsulated within an aggregate function and must be included in the GROUP BY Clause at the end of the SQL statement.
For example, the following query will return the number of students in each batch:
SELECT
batch_id, AVG(iq)
FROM
students
GROUP BY batch_id;
You can also use the HAVING
clause to filter the result set.
For example, the following query will return the number of students in each batch where each batches average IQ is greater than or equal to 100:
SELECT
batch_id, AVG(iq)
FROM
students
GROUP BY batch_id
HAVING AVG(iq) >= 100;