# Schema Design: A case study - [Schema Design: A case study](#schema-design-a-case-study) - [Key Terms](#key-terms) - [Schema](#schema) - [SQL data types](#sql-data-types) - [String types](#string-types) - [Numeric types](#numeric-types) - [Schema design](#schema-design) - [Case study: Requirements](#case-study-requirements) - [Initial design](#initial-design) - [Cardinality](#cardinality) - [Caveat 1: NULL values](#caveat-1-null-values) - [Caveat 2: Relations with attributes](#caveat-2-relations-with-attributes) - [Recap](#recap) - [Final Design](#final-design) - [Reading list](#reading-list) ## Key Terms ### Schema > refers to the organization of data as a blueprint of how the database is constructed > In a relational database, the schema defines the tables, fields, relationships, views, indexes, packages, procedures, functions, queues, triggers, types, sequences, materialized views, synonyms, database links, directories, XML schemas, and other elements ## SQL data types MySQL supports SQL data types in several categories: numeric types, date and time types, string (character and byte) types, spatial types, and the JSON data type. The following are the string and numeric types: ### String types | Type | Description | Size | Range | Example | | --- | --- | --- | --- | --- | | `CHAR(n)` | Fixed-length string | 0-255 | 0-65,535 | `CHAR(10)` | | `VARCHAR(n)` | Variable-length string | 0-255 | 0-65,535 | `VARCHAR(10)` | | `TINYTEXT` | Variable-length string | 0-255 | 0-65,535 | `VARCHAR(10)` | | `TEXT` | Variable-length string | 0-65,535 | 0-4,294,967,295 | `TEXT` | | `MEDIUMTEXT` | Variable-length string | 0-16,777,215 | 0-4,294,967,295 | `MEDIUMTEXT` | | `LONGTEXT` | Variable-length string | 0-4,294,967,295 | 0-4,294,967,295 | `LONGTEXT` | ### Numeric types | Type | Description | Size | Range | Example | | --- | --- | --- | --- | --- | | `TINYINT` | Integer | 1 byte | -128 to 127 | `TINYINT` | | `SMALLINT` | Integer | 2 bytes | -32,768 to 32,767 | `SMALLINT` | | `MEDIUMINT` | Integer | 3 bytes | -8,388,608 to 8,388,607 | `MEDIUMINT` | | `INT` | Integer | 4 bytes | -2,147,483,648 to 2,147,483,647 | `INT` | | `BIGINT` | Integer | 8 bytes | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | `BIGINT` | | `DECIMAL` | Fixed-point number | 0-65 | -10^38+1 to 10^38-1 | `DECIMAL(10, 2)` | | `FLOAT` | Floating-point number | 4 bytes | -3.402823466E+38 to -1.175494351E-38, 0, and 1.175494351E-38 to 3.402823466E+38 | `FLOAT` | ## Schema design A schema is a blueprint of a database. It is created before you actually construct the database so that the schema design can be reviewed. Schema diagrams are also a great way to document the database structure in one place. Remember our student's database from the [previous lesson](01-database-fundamentals.md)? We had the three following tables * `students` (id, name, age, address, phone, email, batch ID) * `mentors` (id, name, age, address, phone, email) * `batches` (id, name, mentor, start date, type, mentor ID) So each table has `ID` as primary key. The `students` table has a `batch ID` field that references the `batches` table and the `batches` table has a `mentor ID` field that references the `mentors` table. These are examples of foreign keys. These are some the items that are present in a schema. A schema will also contain indexes, constraints, and other items that are present in a table. Following is a schema diagram for the above database. Note that the primary key is not highlighted here, which ideally should be.  > **Note** > > Try it yourself. \ > Go to [this](https://diagramplus.com/) website and import [this](../media/schema.diagram) diagram. \ > Try adding a new column or even a new table. ### Case study: Requirements * There are several batches at Scaler. Each batch has an ID, name, current instructor. * Each batch has multiple classes. Each class has an ID, name, instructor * Every Student has a name, ID, grad year, university, email, phone number, ctc, current batch. * Every student also has a student buddy. * A student may have been moved from one batch to another due to pausing the course. So we need to know the entry date and leaving date of a student on every batch they were a part of.. * Every student has a mentor. Every mentor has a name, dob. You can also find these [here](https://docs.google.com/document/d/1uNhwSuzskCa_mw4TECtGfHFl8C6bxSHuystu99rST3A/edit). ### Initial design The first step of designing a schema is to identify the entities. This can be done by identifying the nouns in the requirements. For example, take the first requirement. > There are several batches at Scaler. Each batch has an ID, name, current instructor. We can identify the following entities: * Batches * Instructor Running through the requirements, we can identify the following entities: * Batches * Instructor * Students * Classes * Mentor The next step would be to identify the attributes of each entity. For example, the `Batches` entity has the following attributes: * ID * Name * Current Instructor * Classes The initial set of tables would look like this: ```mermaid classDiagram class Batches{ + ID + Name + Current Instructor + Classes } class Instructor{ + ID + Name + DOB } class Students{ + ID + Name + Grad Year + University + Email + Phone Number + CTC + Current Batch + Student Buddy + Mentor + Previous Batches } class Classes{ + ID + Name + Instructor } class Mentor{ + ID + Name + DOB } ``` The initial design has the following issues: * Foreign keys are not present * Attributes are not atomic * There is no way to know the entry and leaving date of a student on a batch The next step would be to identify the relationships between the entities to add the foreign keys. To identify relationships, we need to find the cardinality of the relations. ### Cardinality > Cardinality is the maximum times an entity can relate to an instance with another entity or entity set. > the number of interactions entities have with each other. **One to One (1:1)** > A "one-to-one" relationship is seen when one instance of entity 1 is related to only one instance of entity 2 and vice-versa A student can only have one email address and one email address can be associated with only one student.  An attribute shared by both entities can be added to either of the entities. **One to Many or Many to one (1:m or m:1)** > When one instance of entity 1 is related to more than one instance of entity 2, the relationship is referred to as "one-to-many. A student can only be associated with one batch, but a batch can have many students.  An attribute shared by both entities can only be added to the entity which has multiple instances i.e. the M side. **Many to Many (m:n)** > When multiple instances of entity 1 are linked to multiple instances of entity 2, we have a "many-to-many" relationship. Imagine a scenario where an employee is assigned more than one project. A student can attend multiple classes and a class can have multiple students.  An attribute shared by both entities has to be added to the relationship. #### Caveat 1: NULL values Often, when a relationship is not present, we use `NULL` values. For example, a student may not have a mentor. In this case, the `mentor_id` field in the `students` table will be `NULL`. This is a valid value for a foreign key. If a table has a lot of NULL values for a foreign key, it is a good idea to create a new mapping table. For example, a student can have a `mentor_id` field with NULL values, we can create a new table `student_mentor`: ```mermaid classDiagram class Student{ +id: int +name: string } class Mentor{ +id: int +name: string } class StudentMentor{ +student_id: int +mentor_id: int } ``` #### Caveat 2: Relations with attributes Sometimes, a relationship has attributes. For example, a student can have multiple batches. In this case, we can add a `joining_date` and `leaving_date` field to the `student_batch` table. If the relation attributes are added to the main table, it can get polluted and add to the latency of the table. A better approach is to create a new table with the relation attributes. ```mermaid classDiagram class Student{ +id: int +name: string } class Batch{ +id: int +name: string } class StudentBatch{ +student_id: int +batch_id: int +joining_date: date +leaving_date: date } ``` #### Recap | Cardinality | Normal Relation | Sparse Relation | Relation with Attributes | Example | | --- | --- | --- | --- | --- | | 1:1 | Add foreign key on any table | Mapping Table | Mapping Table | Student - Email | | 1:M | Add foreign key on M side referencing the other| Mapping Table | Mapping Table | Student - Batch | | M:N | Mapping Table | Mapping Table | Mapping Table | Student - Class | ### Final Design Now that we know about cardinality, we can go ahead and identify the various relationships between the entities. * A batch and an instructor have a Many to One relationship. An instructor can teach multiple batches, but a batch can only have one instructor. So we add a foreign key `instructor_id` to the `batches` table. ```mermaid classDiagram class Batches{ + ID + Name + instructor_id + Classes } class Instructor{ + ID + Name + DOB } Batches "M" -- "1" Instructor ``` * A batch can have multiple classes and a class can be a part of multiple batches i.e. Many to Many relationship. So we create a mapping table `batch_classes` with the attributes `batch_id` and `class_id`. ```mermaid classDiagram class Batches{ + ID + Name + instructor_id } class Classes{ + ID + Name + Instructor } class BatchClass{ + batch_id + class_id } ``` * A class can have one instructor and an instructor can teach multiple classes i.e. Many to One relationship. So we add a foreign key `instructor_id` to the `classes` table. ```mermaid classDiagram class Classes{ + ID + Name + instructor_id } class Instructor{ + ID + Name + DOB } Classes "M" -- "1" Instructor ``` * A student can have one mentor and a mentor can have multiple students i.e. Many to One relationship. So we add a foreign key `mentor_id` to the `students` table. ```mermaid classDiagram class Students{ + ID + Name + Grad Year + University + Email + Phone Number + CTC + Current Batch + Student Buddy + mentor_id + Previous Batches } class Mentor{ + ID + Name + DOB } Students "M" -- "1" Mentor ``` * A student can be a part of only one batch at a time, but a batch can have multiple students i.e. Many to One relationship. So we add a foreign key `batch_id` to the `students` table. ```mermaid classDiagram class Students{ + ID + Name + Grad Year + University + Email + Phone Number + CTC + Student Buddy + mentor_id + Previous Batches + batch_id } class Batches{ + ID + Name + instructor_id } Students "M" -- "1" Batches ``` * A student can have multiple previous batches and a batch can have multiple students i.e. Many to Many relationship. So we create a mapping table `student_batch` with the attributes `student_id` and `batch_id`. Also, we need to store the `joining_date` and `leaving_date` for each student-batch relationship. ```mermaid classDiagram class Students{ + ID + Name + Grad Year + University + Email + Phone Number + CTC + Student Buddy + mentor_id + batch_id } class Batches{ + ID + Name + instructor_id } class StudentBatch{ + student_id + batch_id + joining_date + leaving_date } ``` The complete final design is as follows: ```mermaid classDiagram class Students{ + id + name + grad_year + university + email + phone_number + ctc + student_buddy_id + mentor_id + batch_id } class Mentor{ + id + name + dob } class Batches{ + id + name + instructor_id } class Classes{ + id + name + instructor_id } class Instructor{ + id + name + dob } class StudentBatch{ + student_id + batch_id + joining_date + leaving_date } class BatchClass{ + batch_id + class_id } Students "M" -- "1" Mentor Students "M" -- "1" Batches Batches "M" -- "1" Instructor Classes "M" -- "1" Instructor ``` ## Reading list * [Problems with floats](https://dev.mysql.com/doc/refman/8.0/en/problems-with-float.html) * [Char and Varchar](https://dev.mysql.com/doc/refman/8.0/en/char.html) * [IEEE 754 - Floating Point Arithmetic](https://en.wikipedia.org/wiki/IEEE_754)