From f114791398aa022ebb4b851242b23f156d09b4ff Mon Sep 17 00:00:00 2001 From: Tanmay Date: Tue, 13 Sep 2022 14:27:20 +0100 Subject: [PATCH] Adds notes for schema design. --- database/notes/02-schema-design.md | 439 +++++++++++++++++++++++++++++ 1 file changed, 439 insertions(+) create mode 100644 database/notes/02-schema-design.md diff --git a/database/notes/02-schema-design.md b/database/notes/02-schema-design.md new file mode 100644 index 0000000..9a7885d --- /dev/null +++ b/database/notes/02-schema-design.md @@ -0,0 +1,439 @@ +# Schema Design: A case study + +- [Schema Design: A case study](#schema-design-a-case-study) + - [Key Terms](#key-terms) + - [Schema](#schema) + - [SQL data types](#sql-data-types) + - [String types](#string-types) + - [Numeric types](#numeric-types) + - [Schema design](#schema-design) + - [Case study: Requirements](#case-study-requirements) + - [Initial design](#initial-design) + - [Cardinality](#cardinality) + - [Caveat 1: NULL values](#caveat-1-null-values) + - [Caveat 2: Relations with attributes](#caveat-2-relations-with-attributes) + - [Recap](#recap) + - [Final Design](#final-design) + - [Reading list](#reading-list) + + +## Key Terms +### Schema +> refers to the organization of data as a blueprint of how the database is constructed + +> In a relational database, the schema defines the tables, fields, relationships, views, indexes, packages, procedures, functions, queues, triggers, types, sequences, materialized views, synonyms, database links, directories, XML schemas, and other elements + + +## SQL data types + +MySQL supports SQL data types in several categories: numeric types, date and time types, string (character and byte) types, spatial types, and the JSON data type. The following are the string and numeric types: + +### String types + +| Type | Description | Size | Range | Example | +| --- | --- | --- | --- | --- | +| `CHAR(n)` | Fixed-length string | 0-255 | 0-65,535 | `CHAR(10)` | +| `VARCHAR(n)` | Variable-length string | 0-255 | 0-65,535 | `VARCHAR(10)` | +| `TINYTEXT` | Variable-length string | 0-255 | 0-65,535 | `VARCHAR(10)` | +| `TEXT` | Variable-length string | 0-65,535 | 0-4,294,967,295 | `TEXT` | +| `MEDIUMTEXT` | Variable-length string | 0-16,777,215 | 0-4,294,967,295 | `MEDIUMTEXT` | +| `LONGTEXT` | Variable-length string | 0-4,294,967,295 | 0-4,294,967,295 | `LONGTEXT` | + +### Numeric types + +| Type | Description | Size | Range | Example | +| --- | --- | --- | --- | --- | +| `TINYINT` | Integer | 1 byte | -128 to 127 | `TINYINT` | +| `SMALLINT` | Integer | 2 bytes | -32,768 to 32,767 | `SMALLINT` | +| `MEDIUMINT` | Integer | 3 bytes | -8,388,608 to 8,388,607 | `MEDIUMINT` | +| `INT` | Integer | 4 bytes | -2,147,483,648 to 2,147,483,647 | `INT` | +| `BIGINT` | Integer | 8 bytes | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | `BIGINT` | +| `DECIMAL` | Fixed-point number | 0-65 | -10^38+1 to 10^38-1 | `DECIMAL(10, 2)` | +| `FLOAT` | Floating-point number | 4 bytes | -3.402823466E+38 to -1.175494351E-38, 0, and 1.175494351E-38 to 3.402823466E+38 | `FLOAT` | + +## Schema design + +A schema is a blueprint of a database. It is created before you actually construct the database so that the schema design can be reviewed. Schema diagrams are also a great way to document the database structure in one place. + +Remember our student's database from the [previous lesson](01-database-fundamentals.md)? We had the three following tables +* `students` (id, name, age, address, phone, email, batch ID) +* `mentors` (id, name, age, address, phone, email) +* `batches` (id, name, mentor, start date, type, mentor ID) + +So each table has `ID` as primary key. The `students` table has a `batch ID` field that references the `batches` table and the `batches` table has a `mentor ID` field that references the `mentors` table. These are examples of foreign keys. These are some the items that are present in a schema. A schema will also contain indexes, constraints, and other items that are present in a table. + +Following is a schema diagram for the above database. Note that the primary key is not highlighted here, which ideally should be. + +![Schema](../media/schema.png) + +> **Note** +> +> Try it yourself. \ +> Go to [this](https://diagramplus.com/) website and import [this](../media/schema.diagram) diagram. \ +> Try adding a new column or even a new table. + +### Case study: Requirements + +* There are several batches at Scaler. Each batch has an ID, name, current instructor. +* Each batch has multiple classes. Each class has an ID, name, instructor +* Every Student has a name, ID, grad year, university, email, phone number, ctc, current batch. +* Every student also has a student buddy. +* A student may have been moved from one batch to another due to pausing the course. So we need to know the entry date and leaving date of a student on every batch they were a part of.. +* Every student has a mentor. Every mentor has a name, dob. + +You can also find these [here](https://docs.google.com/document/d/1uNhwSuzskCa_mw4TECtGfHFl8C6bxSHuystu99rST3A/edit). + +### Initial design +The first step of designing a schema is to identify the entities. +This can be done by identifying the nouns in the requirements. +For example, take the first requirement. +> There are several batches at Scaler. Each batch has an ID, name, current instructor. + +We can identify the following entities: +* Batches +* Instructor + +Running through the requirements, we can identify the following entities: +* Batches +* Instructor +* Students +* Classes +* Mentor + +The next step would be to identify the attributes of each entity. For example, the `Batches` entity has the following attributes: +* ID +* Name +* Current Instructor +* Classes + +The initial set of tables would look like this: +```mermaid +classDiagram + class Batches{ + + ID + + Name + + Current Instructor + + Classes + } + class Instructor{ + + ID + + Name + + DOB + } + class Students{ + + ID + + Name + + Grad Year + + University + + Email + + Phone Number + + CTC + + Current Batch + + Student Buddy + + Mentor + + Previous Batches + } + class Classes{ + + ID + + Name + + Instructor + } + class Mentor{ + + ID + + Name + + DOB + } +``` + +The initial design has the following issues: +* Foreign keys are not present +* Attributes are not atomic +* There is no way to know the entry and leaving date of a student on a batch + +The next step would be to identify the relationships between the entities to add the foreign keys. To identify relationships, we need to find the cardinality of the relations. + +### Cardinality +> Cardinality is the maximum times an entity can relate to an instance with another entity or entity set. + +> the number of interactions entities have with each other. + +**One to One (1:1)** +> A "one-to-one" relationship is seen when one instance of entity 1 is related to only one instance of entity 2 and vice-versa + +A student can only have one email address and one email address can be associated with only one student. + +![One to One](../media/one-to-one.png) + +An attribute shared by both entities can be added to either of the entities. + +**One to Many or Many to one (1:m or m:1)** +> When one instance of entity 1 is related to more than one instance of entity 2, the relationship is referred to as "one-to-many. + +A student can only be associated with one batch, but a batch can have many students. + +![One to Many](../media/one-to-many.png) + +An attribute shared by both entities can only be added to the entity which has multiple instances i.e. the M side. + +**Many to Many (m:n)** + +> When multiple instances of entity 1 are linked to multiple instances of entity 2, we have a "many-to-many" relationship. Imagine a scenario where an employee is assigned more than one project. + +A student can attend multiple classes and a class can have multiple students. + +![Many to Many](../media/many-to-many.png) + +An attribute shared by both entities has to be added to the relationship. + +#### Caveat 1: NULL values + +Often, when a relationship is not present, we use `NULL` values. For example, a student may not have a mentor. In this case, the `mentor_id` field in the `students` table will be `NULL`. This is a valid value for a foreign key. + +If a table has a lot of NULL values for a foreign key, it is a good idea to create a new mapping table. For example, a student can have a `mentor_id` field with NULL values, we can create a new table `student_mentor`: +```mermaid +classDiagram + class Student{ + +id: int + +name: string + } + class Mentor{ + +id: int + +name: string + } + class StudentMentor{ + +student_id: int + +mentor_id: int + } +``` + +#### Caveat 2: Relations with attributes + +Sometimes, a relationship has attributes. For example, a student can have multiple batches. In this case, we can add a `joining_date` and `leaving_date` field to the `student_batch` table. + +If the relation attributes are added to the main table, it can get polluted and add to the latency of the table. A better approach is to create a new table with the relation attributes. + +```mermaid +classDiagram + class Student{ + +id: int + +name: string + } + class Batch{ + +id: int + +name: string + } + class StudentBatch{ + +student_id: int + +batch_id: int + +joining_date: date + +leaving_date: date + } +``` + +#### Recap + +| Cardinality | Normal Relation | Sparse Relation | Relation with Attributes | Example | +| --- | --- | --- | --- | --- | +| 1:1 | Add foreign key on any table | Mapping Table | Mapping Table | Student - Email | +| 1:M | Add foreign key on M side referencing the other| Mapping Table | Mapping Table | Student - Batch | +| M:N | Mapping Table | Mapping Table | Mapping Table | Student - Class | + +### Final Design + +Now that we know about cardinality, we can go ahead and identify the various relationships between the entities. + +* A batch and an instructor have a Many to One relationship. An instructor can teach multiple batches, but a batch can only have one instructor. So we add a foreign key `instructor_id` to the `batches` table. + +```mermaid +classDiagram + class Batches{ + + ID + + Name + + instructor_id + + Classes + } + class Instructor{ + + ID + + Name + + DOB + } + Batches "M" -- "1" Instructor +``` + +* A batch can have multiple classes and a class can be a part of multiple batches i.e. Many to Many relationship. So we create a mapping table `batch_classes` with the attributes `batch_id` and `class_id`. + +```mermaid +classDiagram + class Batches{ + + ID + + Name + + instructor_id + } + class Classes{ + + ID + + Name + + Instructor + } + class BatchClass{ + + batch_id + + class_id + } +``` +* A class can have one instructor and an instructor can teach multiple classes i.e. Many to One relationship. So we add a foreign key `instructor_id` to the `classes` table. + +```mermaid +classDiagram + class Classes{ + + ID + + Name + + instructor_id + } + class Instructor{ + + ID + + Name + + DOB + } + Classes "M" -- "1" Instructor +``` + +* A student can have one mentor and a mentor can have multiple students i.e. Many to One relationship. So we add a foreign key `mentor_id` to the `students` table. + +```mermaid +classDiagram + class Students{ + + ID + + Name + + Grad Year + + University + + Email + + Phone Number + + CTC + + Current Batch + + Student Buddy + + mentor_id + + Previous Batches + } + class Mentor{ + + ID + + Name + + DOB + } + Students "M" -- "1" Mentor +``` + +* A student can be a part of only one batch at a time, but a batch can have multiple students i.e. Many to One relationship. So we add a foreign key `batch_id` to the `students` table. + +```mermaid +classDiagram + class Students{ + + ID + + Name + + Grad Year + + University + + Email + + Phone Number + + CTC + + Student Buddy + + mentor_id + + Previous Batches + + batch_id + } + class Batches{ + + ID + + Name + + instructor_id + } + Students "M" -- "1" Batches +``` + +* A student can have multiple previous batches and a batch can have multiple students i.e. Many to Many relationship. So we create a mapping table `student_batch` with the attributes `student_id` and `batch_id`. Also, we need to store the `joining_date` and `leaving_date` for each student-batch relationship. + +```mermaid +classDiagram + class Students{ + + ID + + Name + + Grad Year + + University + + Email + + Phone Number + + CTC + + Student Buddy + + mentor_id + + batch_id + } + class Batches{ + + ID + + Name + + instructor_id + } + class StudentBatch{ + + student_id + + batch_id + + joining_date + + leaving_date + } +``` + +The complete final design is as follows: + +```mermaid +classDiagram + class Students{ + + id + + name + + grad_year + + university + + email + + phone_number + + ctc + + student_buddy_id + + mentor_id + + batch_id + } + class Mentor{ + + id + + name + + dob + } + + class Batches{ + + id + + name + + instructor_id + } + + class Classes{ + + id + + name + + instructor_id + } + + class Instructor{ + + id + + name + + dob + } + + class StudentBatch{ + + student_id + + batch_id + + joining_date + + leaving_date + } + + class BatchClass{ + + batch_id + + class_id + } + + Students "M" -- "1" Mentor + Students "M" -- "1" Batches + Batches "M" -- "1" Instructor + Classes "M" -- "1" Instructor +``` + +## Reading list +* [Problems with floats](https://dev.mysql.com/doc/refman/8.0/en/problems-with-float.html) +* [Char and Varchar](https://dev.mysql.com/doc/refman/8.0/en/char.html) +* [IEEE 754 - Floating Point Arithmetic](https://en.wikipedia.org/wiki/IEEE_754) \ No newline at end of file