14 KiB
Agenda
- Delete
- Delete vs truncate vs drop
- Limit
- Count
- Order By
- Join
Delete
DELETE FROM table_name WHERE conditions;
Example:
DELETE FROM film WHERE id = 1;
The above query will delete the row with id
1 from the film
table.
Beware, If you don't specify a where clause, then all the rows from the table will be deleted. Example:
DELETE FROM film;
Let's talk about how delete works as well in terms of code.
for each row in film:
if row.matches(conditions in where clause)
delete row
There is a minor advance thing about DELETE which we shall talk about along with Joins in the next class. So, don't worry about it for now.
Delete vs Truncate vs Drop
There are two more commands which are used to delete rows from a table. They are TRUNCATE
and DROP
. Let's discuss them one by one.
Truncate
The command looks as follows:
TRUNCATE film;
The above query will delete all the rows from the film
table. TRUNCATE command internally works by removing the complete table and then recreating it. So, it is much faster than DELETE. But it has a disadvantage. It cannot be rolled back. We will learn more about rollbacks in the class on Transactions (In short, rollbacks can only happen for incomplete transactions - not committed yet - to be discussed in future classes). But at a high level, this is because as the complete table is deleted as an intermediate step, no log is maintained as to what all rows were deleted, and thus is not easy to revert. So, if you run a TRUNCATE query, then you cannot undo it.
Note: It also resets the primary key ID. For example, if the highest ID in the table before truncating was 10, then the next row inserted after truncating will have an ID of 1.
Drop
The command looks as follows:
Example:
DROP TABLE film;
The above query will delete the film
table. The difference between DELETE
and DROP
is that DELETE
is used to delete rows from a table and DROP
is used to delete the entire table. So, if you run a DROP
query, then the entire table will be deleted. All the rows and the table structure will be deleted. So, be careful while running a DROP
query. Nothing will be left of the table after running a DROP
query. You will have to recreate the table from scratch.
Note that, DELETE:
- Removes specified rows one-by-one from table (may delete all rows if no condition is present in query but keeps table structure intact).
- It is slower than TRUNCATE.
- Doesn't reset the key.
- It can be rolled back.
TRUNCATE:
- Removes the complete table and then recreats it.
- Faster than DELETE.
- Resets the key.
- It can not be rolled back because the complete table is deleted as an intermediate step.
DROP:
- Removes complete table and the table structre as well.
- It can not be rolled back.
Rollback to be discussed in transactions class. Note that there is no undo in SQL queries once the query is completely committed.
LIKE Operator
LIKE operator is one of the most important and frequently used operator in SQL. Whenever there is a column storing strings, there comes a requirement to do some kind of pattern matching. Example, assume Scaler's database where we have a batches
table with a column called name
. Let's say we want to get the list of Academy
batches and the rule is that an Academy batch shall have Academy
somewhere within the name. How do we find those? We can use the LIKE
operator for this purpose. Example:
SELECT * FROM batches WHERE name LIKE '%Academy%';
Similarly, let's say in our Sakila database, we want to get all the films which have LOVE
in their title. We can use the LIKE
operator. Example:
SELECT * FROM film WHERE title LIKE '%LOVE%';
Let's talk about how the LIKE
operator works. The LIKE
operator works with the help of 2 wildcards in our queries, %
and _
. The %
wildcard matches any number of characters (>= 0 occurrences of any set of characters). The _
wildcard matches exactly one character (any character). Example:
- LIKE 'cat%' will match "cat", "caterpillar", "category", etc. but not "wildcat" or "dog".
- LIKE '%cat' will match "cat", "wildcat", "domesticcat", etc. but not "cattle" or "dog".
- LIKE '%cat%' will match "cat", "wildcat", "cattle", "domesticcat", "caterpillar", "category", etc. but not "dog" or "bat".
- LIKE '_at' will match "cat", "bat", "hat", etc. but not "wildcat" or "domesticcat".
- LIKE 'c_t' will match "cat", "cot", "cut", etc. but not "chat" or "domesticcat".
- LIKE 'c%t' will match "cat", "chart", "connect", "cult", etc. but not "wildcat", "domesticcat", "caterpillar", "category".
COUNT
Count function takes the values from a particular column and returns the number of values in that set. Umm, but don't you think it will be exactly same as the number of rows in the table? Nope. Not true. Aggregate functions only take not null values into account. So, if there are any null values in the column, they will not be counted.
Example: Let's take a students table with data like follows:
id | name | age | batch_id |
---|---|---|---|
1 | A | 20 | 1 |
2 | B | 21 | 1 |
3 | C | 22 | null |
4 | D | 23 | 2 |
If you will try to run COUNT and give it the values in batch_id column, it will return 3. Because there are 3 not null values in the column. This is different from number of rows in the students table.
Let's see how do you use this operation in SQL.
SELECT COUNT(batch_id) FROM students;
To understand how aggregate functions work via a pseudocode, let's see how SQL query optimizer may execute them.
table = []
count = 0
for row in table:
if row[batch_id] is not null:
count += 1
print(count)
Few things to note here: While printing, do we have access to the values of row? Nope. We only have access to the count variable. So, we can only print the count. Extrapolating this point, when you use aggregate functions, you can only print the result of the aggregate function. You cannot print the values of the rows.
Eg:
SELECT COUNT(batch_id), batch_id FROM students;
This will be an invalid query. Because, you are trying to print the values of batch_id
column as well as the count of batch_id
column. But, you can only print the count of batch_id
column.
LIMIT Clause
And now let's discuss the last clause for the day. LIMIT clause allows us to limit the number of rows returned by a query. Example:
SELECT * FROM film LIMIT 10;
The above query will return only 10 rows from the film
table. If you want to return 10 rows starting from the 11th row, you can use the OFFSET
keyword. Example:
SELECT * FROM film LIMIT 10 OFFSET 10;
The above query will return 10 rows starting from the 11th row from the film
table.
Note that in MySQL, you cannot use the OFFSET
keyword without the LIMIT
keyword. Example:
SELECT * FROM film OFFSET 10;
throws an error.
LIMIT clause is applied at the end. Just before printing the results. Taking the example of pseudocode, it works as follows:
answer = []
for each row in film:
if row.matches(conditions in where clause) # new line from above
answer.append(row)
answer.sort(column_names in order by clause)
filtered_answer = []
for each row in answer:
filtered_answer.append(row['rating'], row['release_year'])
return filtered_answer[start_of_limit: end_of_limit]
Thus, if your query contains ORDER BY clause, then LIMIT clause will be applied after the ORDER BY clause. Example:
SELECT * FROM film ORDER BY title LIMIT 10;
The above query will return 10 rows from the film
table in ascending order of the title
column.
ORDER BY Clause
Now let's discuss another important clause. ORDER BY clause allows to return values in a sorted order. Example:
SELECT * FROM film ORDER BY title;
The above query will return all the rows from the film
table in ascending order of the title
column. If you want to return the rows in descending order, you can use the DESC
keyword. Example:
SELECT * FROM film ORDER BY title DESC;
You can also sort by multiple columns. Example:
SELECT * FROM film ORDER BY title, release_year;
The above query will return all the rows from the film
table in ascending order of the title
column and then in ascending order of the release_year
column. Consider the second column as tie breaker. If 2 rows have same value of title, release year will be used to break tie between them. Example:
SELECT * FROM film ORDER BY title DESC, release_year DESC;
Above query will return all the rows from the film
table in descending order of the title
column and if tie on title
, in descending order of the release_year
column.
By the way, you can ORDER BY on a column which is not present in the SELECT clause. Example:
SELECT title FROM film ORDER BY release_year;
Let's also build the analogy of this with a pseudocode.
answer = []
for each row in film:
if row.matches(conditions in where clause) # new line from above
answer.append(row)
answer.sort(column_names in order by clause)
filtered_answer = []
for each row in answer:
filtered_answer.append(row['rating'], row['release_year'])
return filtered_answer
If you see, the ORDER BY
clause is applied after the WHERE
clause. So, first the rows are filtered based on the WHERE
clause and then they are sorted based on the ORDER BY
clause. And only after that are the columns that have to be printed taken out. And that's why you can sort based on columns not even in the SELECT
clause.
ORDER BY Clause with DISTINCT keyword
If you also have DISTINCT in the SELECT clause, then you can only sort by columns that are present in the SELECT clause. Example:
SELECT DISTINCT title FROM film ORDER BY release_year;
The above query will give an error. You can only sort by title
column. Example:
SELECT DISTINCT title FROM film ORDER BY title;
Why this? Because without this the results can be ambiguous. Example:
SELECT DISTINCT title FROM film ORDER BY release_year;
The above query will return all the distinct titles from the film
table. But which release_year
should be used to sort them? There can be multiple release_year
for a particular title
. So, the results will be ambiguous.
Joins
Every SQL query we had written till now was only finding data from 1 table. Most of the queries we had written in the previous classes were on the film
table where we applied multiple filters etc. But do you think being able to query data from a single table is enough? Let's take a scenario of Scaler. Let's say we have 2 tables as follows in the Scaler's database:
batches
batch_id | batch_name |
---|---|
1 | Batch A |
2 | Batch B |
3 | Batch C |
students
student_id | first_name | last_name | batch_id |
---|---|---|---|
1 | John | Doe | 1 |
2 | Jane | Doe | 1 |
3 | Jim | Brown | 2 |
4 | Jenny | Smith | 3 |
5 | Jack | Johnson | 2 |
Suppose, someone asks you to print the name of every student, along with the name of their batch. The output should be something like:
student_name | batch_name |
---|---|
John | Batch A |
Jane | Batch A |
Jim | Batch B |
Jenny | Batch C |
Jack | Batch B |
Will you be able to get all of this data by querying over a single table? No. The student_name
is there in the students table, while the batch_name
is in the batches table! We somehow need a way to combine the data from both the tables. This is where joins come in. What does the word join
mean to you?
Joins, as the name suggests, are a way to combine data from multiple tables. For example, if I want to combine the data from the students
and batches
table, I can use joins for that. Think of joins as a way to stitch rows of 2 tables together, based on the condition you specify. Example: In our case, we would want to stitch a row of students table with a row of batches table based on what? Imagine that every row of students
I try to match with every row of batches
. Based on what condition to be true between those will I stitch them?
We would want to stitch a row of students table with a row of batches table based on the batch_id
column. This is what we call a join condition
. A join condition is a condition that must be true between the rows of 2 tables for them to be stitched together. Let's see how we can write a join query for our example.
SELECT students.first_name, batches.batch_name
FROM students
JOIN batches
ON students.batch_id = batches.batch_id;
Let's break down this query. The first line is the same as what we have been writing till now. We are selecting the first_name
column from the students
table and the batch_name
column from the batches
table. The next line is where the magic happens. We are using the JOIN
keyword to tell SQL that we want to join the students
table with the batches
table. The next line is the join condition. We are saying that we want to join the rows of students
table with the rows of batches
table where the batch_id
column of students
table is equal to the batch_id
column of batches
table. This is how we write a join query.
Let's take an example of this on the Sakila database. Let's say for every film, we want to print its name and the language. How can we do that?
SELECT film.title, language.name
FROM film
JOIN language
ON film.language_id = language.language_id;
Now, sometimes typing name of tables in the query can become difficult. For example, in the above query, we have to type film
and language
multiple times. To make this easier, we can give aliases to the tables. For example, we can give the alias f
to the film
table and l
to the language
table. We can then use these aliases in our query. Let's see how we can do that:
SELECT f.title, l.name
FROM film AS f
JOIN language AS l
ON f.language_id = l.language_id;