fundamentals/database/notes/06-sql-joins-aggregation.md
2023-01-08 16:01:42 +00:00

12 KiB

SQL Joins and Aggregation

Agenda

  • SQL Joins
  • Aggregation Functions (SUM, AVG, COUNT etc)
  • Sub-queries
  • Built-in Functions

Joins

Join is the widely-used clause in the SQL Server essentially to combine and retrieve data from two or more tables. In a real-world relational database, data is structured in many tables and which is why, there is a constant need to join these multiple tables based on logical relationships between them.

Inner join

Inner Join clause in SQL Server creates a new table (not physical) by combining rows that have matching values in two or more tables. This join is based on a logical relationship (or a common field) between the tables and is used to retrieve data that appears in both tables.

Inner Join

Assume, we have two tables, Table A and Table B, that we would like to join using SQL Inner Join. The result of this join will be a new result set that returns matching rows in both these tables. The intersection part in black below shows the data retrieved using Inner Join.

Inner Join

Keyword: INNER JOIN or simply JOIN Syntax: SELECT column1, column2, ... FROM table_name1 JOIN table_name2 ON condition

For example, we want to get the batch names of all the students along with their names.

id first_name last_name batch_name
1 John Watson Sherlock
2 Mycroft Holmes Sherlock

This can be achieved by using the following SQL query:

SELECT s.first_name, s.last_name, b.batch_name FROM students s JOIN batches b ON s.batch_id = b.id;

Left Outer Join

Let us consider the students and batches tables below:

id first_name last_name batch_id
1 John Watson 1
2 Mycroft Holmes 1
3 Moriarty Patel NULL
id batch_name
1 Sherlock Academy
2 Crime Academy

Now if we decide to get the batch names of all the students along with their names with an inner join we will get the following result:

id first_name last_name batch_name
1 John Watson Sherlock Academy
2 Mycroft Holmes Sherlock Academy

Since both John and Mycroft had a valid batch_id, the respective rows were merged to produce the result. But Moriarty did not have a valid batch_id, so the result set will not contain him.

But if we want to run a query to get all students along with their batch names or NULL if they do not have a batch_id, we can use the Left Outer Join.

The LEFT JOIN keyword returns all records from the left table (table1), and the matching records from the right table (table2). The result is 0 records from the right side, if there is no match.

Left Outer Join

Keyword: LEFT OUTER JOIN or simply LEFT JOIN Syntax: SELECT column1, column2, ... FROM table_name1 LEFT JOIN table_name2 ON condition

The query for fetching students with their batch names now will be:

SELECT 
    s.first_name, s.last_name, b.name
FROM
    students s
        LEFT JOIN
    batches b ON s.batch_id = b.id;

And the result will be:

id first_name last_name batch_name
1 John Watson Sherlock Academy
2 Mycroft Holmes Sherlock Academy
3 Moriarty Patel NULL

Right Outer Join

Similarly, RIGHT OUTER JOIN returns all records from the right table (table2), and the matching records from the left table (table1). The result is 0 records from the left side, if there is no match.

Now if we want to get all the batches along with their students if present else NULL, we can use RIGHT OUTER JOIN.

Keyword: RIGHT OUTER JOIN or simply RIGHT JOIN Syntax: SELECT column1, column2, ... FROM table_name1 RIGHT JOIN table_name2 ON condition

Right Join

The query for fetching batches with their students now will be:

SELECT 
    s.first_name, s.last_name, b.name
FROM
    students s
        RIGHT JOIN
    batches b ON s.batch_id = b.id;

This will return the following result:

id first_name last_name batch_name
1 John Watson Sherlock Academy
2 Mycroft Holmes Sherlock Academy
NULL NULL NULL Crime Academy

Full Outer Join

The FULL OUTER JOIN clause returns a result set that includes rows from both left and right tables.

When no matching rows exist for the row in the left table, the columns of the right table will contain NULL. Likewise, when no matching rows exist for the row in the right table, the column of the left table will contain NULL.

Full Outer Join

Keyword: FULL OUTER JOIN or simply FULL JOIN Syntax: SELECT column1, column2, ... FROM table_name1 FULL JOIN table_name2 ON condition

The query for fetching batches with their students now will be:

SELECT 
    s.first_name, s.last_name, b.name
FROM
    students s
        FULL JOIN
    batches b ON s.batch_id = b.id;

This will return the following result:

id first_name last_name batch_name
1 John Watson Sherlock Academy
2 Mycroft Holmes Sherlock Academy
3 Moriarty Patel NULL
NULL NULL NULL Crime Academy

Warning

MySQL does not support the FULL OUTER JOIN clause.

Cross Join

The result set from a cross join will include all rows from both tables, where each row is the combination of the row in the first table with the row in the second table. In general, if each table has n and m rows respectively, the result set will have n x m rows.

In other words, the CROSS JOIN clause returns a Cartesian product of rows from the joined tables.

Keyword: CROSS JOIN Syntax: SELECT column1, column2, ... FROM table_name1 CROSS JOIN table_name2 Syntax 2: SELECT column1, column2, ... FROM table_name1, table_name2

An example query for fetching from the students and batches tables will be:

SELECT 
    s.first_name, s.last_name, b.name
FROM
    students s
        CROSS JOIN
    batches b;

Note that different from the INNER JOIN, LEFT JOIN , and RIGHT JOIN clauses, the CROSS JOIN clause does not have a join predicate. In other words, it does not have the ON or USING clause.

If you add a WHERE clause, in case table t1 and t2 has a relationship, the CROSS JOIN works like the INNER JOIN.


Aggregate Functions

MySQL's aggregate function is used to perform calculations on multiple values and return the result in a single value like the average of all values, the sum of all values, and maximum & minimum value among certain groups of values.

We mainly use the aggregate functions in databases, spreadsheets and many other data manipulation software packages. In the context of business, different organization levels need different information such as top levels managers interested in knowing whole figures and not the individual details. These functions produce the summarised data from our database. Thus, they are extensively used in economics and finance to represent the economic health or stock and sector performance.

The ISO SQL standard defines the following aggregate functions:

Function Description
MIN Return the minimum value
MAX Return the maximum value
SUM Return the sum of all values
AVG Return the average value of the argument
COUNT Return a count of the number of rows returned

Get the minimum value

Keyword: MIN Syntax: SELECT MIN(column_name) FROM table_name

Getting the minimum value without using the aggregate function could be done by using the ORDER BY clause.

SELECT 
    first_name, iq
FROM
    students
ORDER BY iq
LIMIT 1;

This would give the correct answer but not in the case of NULL values. By default, MySQL will return the NULL value at the top of the sorted rows. Special handling will have to be done to get the correct result.

This is why we can use the aggregate function MIN to get the minimum value.

SELECT 
    MIN(iq)
FROM
    students;

The aggregate functions ignore the NULL values.

Get the maximum value

Keyword: MAX Syntax: SELECT MAX(column_name) FROM table_name

To get the maximum IQ in students, we can use the aggregate function MAX.

SELECT 
    MAX(iq)
FROM
    students;

Get the sum of all values

Keyword: SUM Syntax: SELECT SUM(column_name) FROM table_name

To get the sum of all the IQ in students, we can use the aggregate function SUM.

SELECT 
    SUM(iq)
FROM
    students;

Get the average value

Keyword: AVG Syntax: SELECT AVG(column_name) FROM table_name

To get the average IQ in students, we can use the aggregate function AVG.

SELECT 
    AVG(iq)
FROM
    students;

Get the number of rows

Keyword: COUNT Syntax: SELECT COUNT(column_name) FROM table_name

To get the number of rows in students, we can use the aggregate function COUNT.

SELECT 
    COUNT(*)
FROM
    students;

Using * instead of a column name will return the number of rows in the table whereas using a column name will return the number of rows that have a non-null value in a column.

GROUP BY clause

The GROUP BY clause groups a set of rows into a set of summary rows by values of columns or expressions. The GROUP BY clause returns one row for each group. In other words, it reduces the number of rows in the result set

Keyword: GROUP BY Syntax: SELECT column1, column2, ... FROM table_name GROUP BY column1, column2, ...

In this syntax, you place the GROUP BY clause after the FROM and WHERE clauses. After the GROUP BY keywords, you place is a list of comma-separated columns or expressions to group rows.

MySQL evaluates the GROUP BY clause after the FROM and WHERE clauses and before the HAVING, SELECT, DISTINCT, ORDER BY and LIMIT clauses

Group By

Fields that are not encapsulated within an aggregate function and must be included in the GROUP BY Clause at the end of the SQL statement.

For example, the following query will return the number of students in each batch:

SELECT 
    batch_id, AVG(iq)
FROM
    students
GROUP BY batch_id;

You can also use the HAVING clause to filter the result set. For example, the following query will return the number of students in each batch where each batches average IQ is greater than or equal to 100:

SELECT 
    batch_id, AVG(iq)
FROM
    students
GROUP BY batch_id
HAVING AVG(iq) >= 100;

Problems

  1. Simple - I
  2. Simple - II
  3. Joins - I
  4. Joins - II
  5. Join - III
  6. Aggregation - I
  7. Aggregation - II
  8. Aggregation - III