mirror of
https://github.com/robindhole/fundamentals.git
synced 2025-03-15 19:00:19 +00:00
Adds notes for schema design.
This commit is contained in:
parent
70c3573f0a
commit
f114791398
439
database/notes/02-schema-design.md
Normal file
439
database/notes/02-schema-design.md
Normal file
@ -0,0 +1,439 @@
|
||||
# Schema Design: A case study
|
||||
|
||||
- [Schema Design: A case study](#schema-design-a-case-study)
|
||||
- [Key Terms](#key-terms)
|
||||
- [Schema](#schema)
|
||||
- [SQL data types](#sql-data-types)
|
||||
- [String types](#string-types)
|
||||
- [Numeric types](#numeric-types)
|
||||
- [Schema design](#schema-design)
|
||||
- [Case study: Requirements](#case-study-requirements)
|
||||
- [Initial design](#initial-design)
|
||||
- [Cardinality](#cardinality)
|
||||
- [Caveat 1: NULL values](#caveat-1-null-values)
|
||||
- [Caveat 2: Relations with attributes](#caveat-2-relations-with-attributes)
|
||||
- [Recap](#recap)
|
||||
- [Final Design](#final-design)
|
||||
- [Reading list](#reading-list)
|
||||
|
||||
|
||||
## Key Terms
|
||||
### Schema
|
||||
> refers to the organization of data as a blueprint of how the database is constructed
|
||||
|
||||
> In a relational database, the schema defines the tables, fields, relationships, views, indexes, packages, procedures, functions, queues, triggers, types, sequences, materialized views, synonyms, database links, directories, XML schemas, and other elements
|
||||
|
||||
|
||||
## SQL data types
|
||||
|
||||
MySQL supports SQL data types in several categories: numeric types, date and time types, string (character and byte) types, spatial types, and the JSON data type. The following are the string and numeric types:
|
||||
|
||||
### String types
|
||||
|
||||
| Type | Description | Size | Range | Example |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `CHAR(n)` | Fixed-length string | 0-255 | 0-65,535 | `CHAR(10)` |
|
||||
| `VARCHAR(n)` | Variable-length string | 0-255 | 0-65,535 | `VARCHAR(10)` |
|
||||
| `TINYTEXT` | Variable-length string | 0-255 | 0-65,535 | `VARCHAR(10)` |
|
||||
| `TEXT` | Variable-length string | 0-65,535 | 0-4,294,967,295 | `TEXT` |
|
||||
| `MEDIUMTEXT` | Variable-length string | 0-16,777,215 | 0-4,294,967,295 | `MEDIUMTEXT` |
|
||||
| `LONGTEXT` | Variable-length string | 0-4,294,967,295 | 0-4,294,967,295 | `LONGTEXT` |
|
||||
|
||||
### Numeric types
|
||||
|
||||
| Type | Description | Size | Range | Example |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `TINYINT` | Integer | 1 byte | -128 to 127 | `TINYINT` |
|
||||
| `SMALLINT` | Integer | 2 bytes | -32,768 to 32,767 | `SMALLINT` |
|
||||
| `MEDIUMINT` | Integer | 3 bytes | -8,388,608 to 8,388,607 | `MEDIUMINT` |
|
||||
| `INT` | Integer | 4 bytes | -2,147,483,648 to 2,147,483,647 | `INT` |
|
||||
| `BIGINT` | Integer | 8 bytes | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | `BIGINT` |
|
||||
| `DECIMAL` | Fixed-point number | 0-65 | -10^38+1 to 10^38-1 | `DECIMAL(10, 2)` |
|
||||
| `FLOAT` | Floating-point number | 4 bytes | -3.402823466E+38 to -1.175494351E-38, 0, and 1.175494351E-38 to 3.402823466E+38 | `FLOAT` |
|
||||
|
||||
## Schema design
|
||||
|
||||
A schema is a blueprint of a database. It is created before you actually construct the database so that the schema design can be reviewed. Schema diagrams are also a great way to document the database structure in one place.
|
||||
|
||||
Remember our student's database from the [previous lesson](01-database-fundamentals.md)? We had the three following tables
|
||||
* `students` (id, name, age, address, phone, email, batch ID)
|
||||
* `mentors` (id, name, age, address, phone, email)
|
||||
* `batches` (id, name, mentor, start date, type, mentor ID)
|
||||
|
||||
So each table has `ID` as primary key. The `students` table has a `batch ID` field that references the `batches` table and the `batches` table has a `mentor ID` field that references the `mentors` table. These are examples of foreign keys. These are some the items that are present in a schema. A schema will also contain indexes, constraints, and other items that are present in a table.
|
||||
|
||||
Following is a schema diagram for the above database. Note that the primary key is not highlighted here, which ideally should be.
|
||||
|
||||

|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Try it yourself. \
|
||||
> Go to [this](https://diagramplus.com/) website and import [this](../media/schema.diagram) diagram. \
|
||||
> Try adding a new column or even a new table.
|
||||
|
||||
### Case study: Requirements
|
||||
|
||||
* There are several batches at Scaler. Each batch has an ID, name, current instructor.
|
||||
* Each batch has multiple classes. Each class has an ID, name, instructor
|
||||
* Every Student has a name, ID, grad year, university, email, phone number, ctc, current batch.
|
||||
* Every student also has a student buddy.
|
||||
* A student may have been moved from one batch to another due to pausing the course. So we need to know the entry date and leaving date of a student on every batch they were a part of..
|
||||
* Every student has a mentor. Every mentor has a name, dob.
|
||||
|
||||
You can also find these [here](https://docs.google.com/document/d/1uNhwSuzskCa_mw4TECtGfHFl8C6bxSHuystu99rST3A/edit).
|
||||
|
||||
### Initial design
|
||||
The first step of designing a schema is to identify the entities.
|
||||
This can be done by identifying the nouns in the requirements.
|
||||
For example, take the first requirement.
|
||||
> There are several batches at Scaler. Each batch has an ID, name, current instructor.
|
||||
|
||||
We can identify the following entities:
|
||||
* Batches
|
||||
* Instructor
|
||||
|
||||
Running through the requirements, we can identify the following entities:
|
||||
* Batches
|
||||
* Instructor
|
||||
* Students
|
||||
* Classes
|
||||
* Mentor
|
||||
|
||||
The next step would be to identify the attributes of each entity. For example, the `Batches` entity has the following attributes:
|
||||
* ID
|
||||
* Name
|
||||
* Current Instructor
|
||||
* Classes
|
||||
|
||||
The initial set of tables would look like this:
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Batches{
|
||||
+ ID
|
||||
+ Name
|
||||
+ Current Instructor
|
||||
+ Classes
|
||||
}
|
||||
class Instructor{
|
||||
+ ID
|
||||
+ Name
|
||||
+ DOB
|
||||
}
|
||||
class Students{
|
||||
+ ID
|
||||
+ Name
|
||||
+ Grad Year
|
||||
+ University
|
||||
+ Email
|
||||
+ Phone Number
|
||||
+ CTC
|
||||
+ Current Batch
|
||||
+ Student Buddy
|
||||
+ Mentor
|
||||
+ Previous Batches
|
||||
}
|
||||
class Classes{
|
||||
+ ID
|
||||
+ Name
|
||||
+ Instructor
|
||||
}
|
||||
class Mentor{
|
||||
+ ID
|
||||
+ Name
|
||||
+ DOB
|
||||
}
|
||||
```
|
||||
|
||||
The initial design has the following issues:
|
||||
* Foreign keys are not present
|
||||
* Attributes are not atomic
|
||||
* There is no way to know the entry and leaving date of a student on a batch
|
||||
|
||||
The next step would be to identify the relationships between the entities to add the foreign keys. To identify relationships, we need to find the cardinality of the relations.
|
||||
|
||||
### Cardinality
|
||||
> Cardinality is the maximum times an entity can relate to an instance with another entity or entity set.
|
||||
|
||||
> the number of interactions entities have with each other.
|
||||
|
||||
**One to One (1:1)**
|
||||
> A "one-to-one" relationship is seen when one instance of entity 1 is related to only one instance of entity 2 and vice-versa
|
||||
|
||||
A student can only have one email address and one email address can be associated with only one student.
|
||||
|
||||

|
||||
|
||||
An attribute shared by both entities can be added to either of the entities.
|
||||
|
||||
**One to Many or Many to one (1:m or m:1)**
|
||||
> When one instance of entity 1 is related to more than one instance of entity 2, the relationship is referred to as "one-to-many.
|
||||
|
||||
A student can only be associated with one batch, but a batch can have many students.
|
||||
|
||||

|
||||
|
||||
An attribute shared by both entities can only be added to the entity which has multiple instances i.e. the M side.
|
||||
|
||||
**Many to Many (m:n)**
|
||||
|
||||
> When multiple instances of entity 1 are linked to multiple instances of entity 2, we have a "many-to-many" relationship. Imagine a scenario where an employee is assigned more than one project.
|
||||
|
||||
A student can attend multiple classes and a class can have multiple students.
|
||||
|
||||

|
||||
|
||||
An attribute shared by both entities has to be added to the relationship.
|
||||
|
||||
#### Caveat 1: NULL values
|
||||
|
||||
Often, when a relationship is not present, we use `NULL` values. For example, a student may not have a mentor. In this case, the `mentor_id` field in the `students` table will be `NULL`. This is a valid value for a foreign key.
|
||||
|
||||
If a table has a lot of NULL values for a foreign key, it is a good idea to create a new mapping table. For example, a student can have a `mentor_id` field with NULL values, we can create a new table `student_mentor`:
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Student{
|
||||
+id: int
|
||||
+name: string
|
||||
}
|
||||
class Mentor{
|
||||
+id: int
|
||||
+name: string
|
||||
}
|
||||
class StudentMentor{
|
||||
+student_id: int
|
||||
+mentor_id: int
|
||||
}
|
||||
```
|
||||
|
||||
#### Caveat 2: Relations with attributes
|
||||
|
||||
Sometimes, a relationship has attributes. For example, a student can have multiple batches. In this case, we can add a `joining_date` and `leaving_date` field to the `student_batch` table.
|
||||
|
||||
If the relation attributes are added to the main table, it can get polluted and add to the latency of the table. A better approach is to create a new table with the relation attributes.
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Student{
|
||||
+id: int
|
||||
+name: string
|
||||
}
|
||||
class Batch{
|
||||
+id: int
|
||||
+name: string
|
||||
}
|
||||
class StudentBatch{
|
||||
+student_id: int
|
||||
+batch_id: int
|
||||
+joining_date: date
|
||||
+leaving_date: date
|
||||
}
|
||||
```
|
||||
|
||||
#### Recap
|
||||
|
||||
| Cardinality | Normal Relation | Sparse Relation | Relation with Attributes | Example |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| 1:1 | Add foreign key on any table | Mapping Table | Mapping Table | Student - Email |
|
||||
| 1:M | Add foreign key on M side referencing the other| Mapping Table | Mapping Table | Student - Batch |
|
||||
| M:N | Mapping Table | Mapping Table | Mapping Table | Student - Class |
|
||||
|
||||
### Final Design
|
||||
|
||||
Now that we know about cardinality, we can go ahead and identify the various relationships between the entities.
|
||||
|
||||
* A batch and an instructor have a Many to One relationship. An instructor can teach multiple batches, but a batch can only have one instructor. So we add a foreign key `instructor_id` to the `batches` table.
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Batches{
|
||||
+ ID
|
||||
+ Name
|
||||
+ instructor_id
|
||||
+ Classes
|
||||
}
|
||||
class Instructor{
|
||||
+ ID
|
||||
+ Name
|
||||
+ DOB
|
||||
}
|
||||
Batches "M" -- "1" Instructor
|
||||
```
|
||||
|
||||
* A batch can have multiple classes and a class can be a part of multiple batches i.e. Many to Many relationship. So we create a mapping table `batch_classes` with the attributes `batch_id` and `class_id`.
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Batches{
|
||||
+ ID
|
||||
+ Name
|
||||
+ instructor_id
|
||||
}
|
||||
class Classes{
|
||||
+ ID
|
||||
+ Name
|
||||
+ Instructor
|
||||
}
|
||||
class BatchClass{
|
||||
+ batch_id
|
||||
+ class_id
|
||||
}
|
||||
```
|
||||
* A class can have one instructor and an instructor can teach multiple classes i.e. Many to One relationship. So we add a foreign key `instructor_id` to the `classes` table.
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Classes{
|
||||
+ ID
|
||||
+ Name
|
||||
+ instructor_id
|
||||
}
|
||||
class Instructor{
|
||||
+ ID
|
||||
+ Name
|
||||
+ DOB
|
||||
}
|
||||
Classes "M" -- "1" Instructor
|
||||
```
|
||||
|
||||
* A student can have one mentor and a mentor can have multiple students i.e. Many to One relationship. So we add a foreign key `mentor_id` to the `students` table.
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Students{
|
||||
+ ID
|
||||
+ Name
|
||||
+ Grad Year
|
||||
+ University
|
||||
+ Email
|
||||
+ Phone Number
|
||||
+ CTC
|
||||
+ Current Batch
|
||||
+ Student Buddy
|
||||
+ mentor_id
|
||||
+ Previous Batches
|
||||
}
|
||||
class Mentor{
|
||||
+ ID
|
||||
+ Name
|
||||
+ DOB
|
||||
}
|
||||
Students "M" -- "1" Mentor
|
||||
```
|
||||
|
||||
* A student can be a part of only one batch at a time, but a batch can have multiple students i.e. Many to One relationship. So we add a foreign key `batch_id` to the `students` table.
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Students{
|
||||
+ ID
|
||||
+ Name
|
||||
+ Grad Year
|
||||
+ University
|
||||
+ Email
|
||||
+ Phone Number
|
||||
+ CTC
|
||||
+ Student Buddy
|
||||
+ mentor_id
|
||||
+ Previous Batches
|
||||
+ batch_id
|
||||
}
|
||||
class Batches{
|
||||
+ ID
|
||||
+ Name
|
||||
+ instructor_id
|
||||
}
|
||||
Students "M" -- "1" Batches
|
||||
```
|
||||
|
||||
* A student can have multiple previous batches and a batch can have multiple students i.e. Many to Many relationship. So we create a mapping table `student_batch` with the attributes `student_id` and `batch_id`. Also, we need to store the `joining_date` and `leaving_date` for each student-batch relationship.
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Students{
|
||||
+ ID
|
||||
+ Name
|
||||
+ Grad Year
|
||||
+ University
|
||||
+ Email
|
||||
+ Phone Number
|
||||
+ CTC
|
||||
+ Student Buddy
|
||||
+ mentor_id
|
||||
+ batch_id
|
||||
}
|
||||
class Batches{
|
||||
+ ID
|
||||
+ Name
|
||||
+ instructor_id
|
||||
}
|
||||
class StudentBatch{
|
||||
+ student_id
|
||||
+ batch_id
|
||||
+ joining_date
|
||||
+ leaving_date
|
||||
}
|
||||
```
|
||||
|
||||
The complete final design is as follows:
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Students{
|
||||
+ id
|
||||
+ name
|
||||
+ grad_year
|
||||
+ university
|
||||
+ email
|
||||
+ phone_number
|
||||
+ ctc
|
||||
+ student_buddy_id
|
||||
+ mentor_id
|
||||
+ batch_id
|
||||
}
|
||||
class Mentor{
|
||||
+ id
|
||||
+ name
|
||||
+ dob
|
||||
}
|
||||
|
||||
class Batches{
|
||||
+ id
|
||||
+ name
|
||||
+ instructor_id
|
||||
}
|
||||
|
||||
class Classes{
|
||||
+ id
|
||||
+ name
|
||||
+ instructor_id
|
||||
}
|
||||
|
||||
class Instructor{
|
||||
+ id
|
||||
+ name
|
||||
+ dob
|
||||
}
|
||||
|
||||
class StudentBatch{
|
||||
+ student_id
|
||||
+ batch_id
|
||||
+ joining_date
|
||||
+ leaving_date
|
||||
}
|
||||
|
||||
class BatchClass{
|
||||
+ batch_id
|
||||
+ class_id
|
||||
}
|
||||
|
||||
Students "M" -- "1" Mentor
|
||||
Students "M" -- "1" Batches
|
||||
Batches "M" -- "1" Instructor
|
||||
Classes "M" -- "1" Instructor
|
||||
```
|
||||
|
||||
## Reading list
|
||||
* [Problems with floats](https://dev.mysql.com/doc/refman/8.0/en/problems-with-float.html)
|
||||
* [Char and Varchar](https://dev.mysql.com/doc/refman/8.0/en/char.html)
|
||||
* [IEEE 754 - Floating Point Arithmetic](https://en.wikipedia.org/wiki/IEEE_754)
|
Loading…
x
Reference in New Issue
Block a user