- For example, name is a noun. But it's an attribute of a class and not a separate entity in itself.
## Step 2: Relationships
1. Point number 2 in the requirement tells us there is a relationship between batch and current instructor.
Cardinality: m:1
Hence, we add current_instructor_id column to batches table.
2. Point number 3 tells us that each batch can have multiple students. And a student is in one batch at a time. They can move, but at a time, they are exactly in one batch. So, there is a relationship between batch and students.
Cardinality: 1:m
Hence, we can add batch_id as a column in students table.
However, here comes the tricky part. Imagine, I want to track all dates when the student moved. Which means the relationship becomes many:many (current + historical batches). I might still have the batch_id in students table to indicate current batch, but I will need a separate table to maintain all historical batches along with their move date.
When a student is moved from one batch to another, this date is an attribute of the relation between `students` and `batches`. So, we will create a new table like this:
`student_batches`
| student_id | batch_id | move_date |
|------------|----------|-----------|
As we have included `batch_id` here, we can remove it from `students` table but that will decrease the performance because everytime we will have to query on this new table also. So, for ease, we will keep the `batch_id` in `students` also.
3. Point number 4 tells us that each batch has multiple classes. Whether a class only has one batch or multiple batches is not specified. Let's assume, that we want the ability to have multiple batches attend a class.
Cardinality: m:m
We will need a separate batch_classes table.
4. Point number 5 tells us there is a relationship between classes and instructor. What is the cardinality between `class` and `instructor`. As this is m:1 cardinality, instructor_id will be included in `classes`.
5. Point number 7 tells us that every student has a buddy.
Here, the cardinality of the buddy relation between a student and another student is m:1.
| student | --- buddy --- | student |
| ------- | ------------- | ------- |
| 1 | --> | 1 |
| m | <--|1|>
So, the `students` table will have one more column called `buddy_id`.
6. Point number 10 tells us that there is a relationship between student and mentor. A student has one mentor, but a mentor can have many students.
Cardinality: m:1
So, we will include mentor_id as a column in the students table.
7. Finally, point number 12 tells us mentor_sessions has a relationship with `mentors` and `students`.
Cardinality between `mentor_sessions` and `students` : m:1
Cardinality between `mentor_sessions` and `mentors` : m:1
So, we add `student_id` and `mentor_id` columns to `mentor_sessions` table.
**Hence, the final table structure after step 1 and 2:**
`batches`
| batch_id | name | start_month | curr_inst_id |
|----------|------|-------------|--------------|
`instructors`
| instructor_id | name |
| ------------- | ---- |
`classes`
| class_id | name | schedule_time | instructor_id |
Cons: The problem in storing enums this way is that it will take a lot of space. It will have slow string comparison.
Pros: Readability. No joins are required.
`batches`
| batch_id | name | type |
| -------- | ---- | ------- |
| 1 | b1 | DSML |
| 2 | b2 | Academy |
| 3 | b3 | Academy |
| 4 | b4 | DSML |
2. Using integers
Here, 0 means DSML type batch and 1 means Academy type batch.
Cons: No readability. We can not add or delete values (enums) in between as it will cause discrepencies. Also, what a particular value represents is not in the database.
`batches`
| batch_id | name | type_id |
| -------- | ---- | ------- |
| 1 | b1 | 0 |
| 2 | b2 | 1 |
| 3 | b3 | 1 |
| 4 | b4 | 0 |
3. Lookup table
It will have id and value columns where each type is stored as separate. The `type_id` of `batches` will refer to the `id` column of `batch_types`. All the above cons are solved with this method.
**batch_types**
| id | value |
| -- | ---------- |
| 1 | Academy |
| 2 | DSML |
| 3 | Neovarsity |
| 4 | SST |
So, the best way to represent enums is to use lookup table.
## Step 4: Deciding Primary Keys of a mapping table
### Example from previous discussion:
For `student_batches` the primary key will be (student_id, batch_id).
`student_batches`
| student_id | batch_id | move_date |
|------------|----------|-----------|
If in case we have our table like this, the primary key will be `id`. Size of index will be lesser here.
`student_batches`
| id | student_id | batch_id | move_date |
| -- |------------|----------|-----------|
### Example 2
1. Scaler has exams.
2. For each batch a student joins, they will have to take exams of that batch.
3. Each exam is associated to a batch.
`exams`
| id | name | start_date | end_date |
| -- | ---- | ---------- | ---------- |
Between batch and exam, each exam is associated to a batch, we will have to create a mapping table. One batch can have multiple exams, One exam can be present fo multiple batches.
`exam_batches`
| exam_id | batch_id |
| ------- | -------- |
Similarly we also have a table called `student_batches`.
`student_batches`
| student_id | batch_id | date |
|------------|----------|------|
To figure out which student went through which exams, we will need to join `student_batches` with `exam_batches`. Basically, we are forming a relation between two mapping tables.
### Example 3
1. One student can belong to multiple batches.
2. Every batch has exams.
3. Same exam may happen on different batches on different dates.
4. If a students moves the batch, they may have to give some exams again.
`student_batches`
| student_id | batch_id | date |
|------------|----------|------|
Cardinality between batches ad exams is m:m. So, we will have a `batch_exams` table. Date is also an attribute of this relation.
`batch_exams`
| batch_id | exam_id | date |
| -------- | ------- | ---- |
Between students and exams also the cardinality is m:m. But if we have (student_id, exam_id) as primary key of the new `student_exams` table, it will not allow one student to take a particular exam twice. So, we will have to add `batch_id` also in PK. The below `student_batch_exams` will be our new table.
`student_batch_exams`
| student_id | batch_id | exam_id | marks |
| ---------- | -------- | ------- | ----- |
Hence, we can see that sometimes a mapping may also have a relation with another entity. In these cases, not having a primary key can cause problems.
**Advantages of a separate key:**
If a relation is being mapped to another entity or relation, it saves space.
**Advantages of NO separate key:**
Queries on first column will become faster because the table will be sorted by that column. A mapping table is often used for relationships and thus will require joins. Having no separate key makes things faster.
## Step 5: Representing Foreign keys and indexes
There are 2 steps here. First establishing foreign key relationships and then indexes for frequent use-cases.
### Step 5.1: Foreign Keys
Typically, when you have a relationship table, which exists because there is a relationship between entity1 and entity2, then it's usually recommended to have a foreign key relationship between the relationship table and the entity it references.
For example, consider the following table which exists due to m:m cardinality relationship between `batches` and `classes`.
`batch_classes`
| batch_id | class_id |
| -------- | -------- |
If we expect that whenever we query this table, we will need to get the associated batch and class details (which is almost always the case), then it makes sense to have a foreign key relationship with `batches` and `classes` table.
Very similarily, all other relationships we discussed in step 2, are eligible for a foreign key constraint.
*Note that however, it is not mandatory to specify foreign key relationships. Foreign key relationships help with data consistency - so, incase you expect that the column I am referring to, must be unique - then specifying a foreign key constraint will enforce that (MySQL automatically then creates an index on that column - the foreign table column - unless one already exists).*
### Step 5.2: Identify Indexes
The second step is to list down the frequent use-cases and explore if the queries I have to write for the frequent use-cases - are they fast or not?
If they are not, then it warrants creating an index to avoid full table scans.
Let's say that the learners often search mentor by a name. This is a use case. On which column of which table will you create an index for this? You have to create an index on `name` column of `mentors` table.
`mentors`
| mentor_id | name | company_name |
|-----------|------|--------------|
As a rule of thumb, given a query, look at the join conditions and where condition, followed by Order by with limit. If your query is slow, creating an index on those columns helps. **Please refer to the quizzes done during the class to understand this better.**
After drawing the complete Schema, mention the indexes.
Design Database Schema for a system like Netflix with following Use Cases.
**Use Cases**
1. Netflix has users.
2. Every user has an email and a password.
3. Users can create profiles to have separate independent environments.
4. Each profile has a name and a type. Type can be KID or ADULT.
5. There are multiple videos on netflix.
6. For each video, there will be a title, description and a cast.
7. A cast is a list of actors who were a part of the video. For each actor we need to know their name and list of videos they were a part of.
8. For every video, for any profile who watched that video, we need to know the status (COMPLETED/ IN PROGRESS).
9. For every profile for whom a video is in progress, we want to know their last watch timestamp.
Let's approach this problem as one should in an interview.
1. Finding all the nouns to create tables.
-`users`
-`profiles`
-`videos`
-`actors` (cast is nothing but a mapping between videos and actors)
1.2. Enums:
-`profile_type` (lookup table)
-`watch_status_type` (enum, it is an attribute of relation between profile and videos)
1.3. Finding attributes of particular entites.
`users`
| id | email | password |
| -- | ----- | -------- |
`profiles`
| id | name |
| -- | ---- |
`profile_type`
| id | value |
| -- | ----- |
`videos`
| id | name | description |
| -- | ---- | ----------- |
`actors`
| id | name |
| -- | ---- |
`watch_status_type`
| id | value |
| -- | ----- |
2. Representing relationships.
Now, there are no relationships in the first and second use cases. Moving forward, what is the cardinality between `users` and `profiles`? One user can have multiple profiles but one profile is associated with one user. Therefore, it is 1:m, id of user will be in `profiles` table.
`profiles`
| id | name | user_id |
| -- | ---- | ------- |
What is the cardinality between `profiles` and `profile_type`? It is m:1, `profiles` will have another column `profile_type_id`.
`profiles`
| id | name | user_id | profile_type_id |
| -- | ---- | ------- | --------------- |
What is the cardinality between `videos` and `actors`? One video can have multiple actors and one actor could be in multiple videos. So, it is m:m.
`video_actors`
| video_id | actor_id |
| -------- | -------- |
Status is an information about relation between `videos` and `profiles`. Hence, a new table is created. Last watch timestamp is also an attribute on these two.