diff --git a/Non-DSA Notes/LLD1 Notes/01 Introduction to Object Oriented Programming.md b/Non-DSA Notes/LLD1 Notes/01 Introduction to Object Oriented Programming.md new file mode 100644 index 0000000..839684a --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/01 Introduction to Object Oriented Programming.md @@ -0,0 +1,403 @@ +# Introduction to Object Oriented Programming +--- +## Programming Paradigms +Programming paradigms are different ways or styles in which a given program or programming language can be organized. Each paradigm consists of certain structures, features, and opinions about how common programming problems should be tackled. + +The question of why are there many different programming paradigms is similar to why are there many programming languages. Certain paradigms are better suited for certain types of problems, so it makes sense to use different paradigms for different kinds of projects. They're more like a set of ideals and guidelines that many people have agreed on, followed, and expanded upon. + +Programming languages aren't always tied to a specific paradigm. There are languages that have been built with a certain paradigm in mind and have features that facilitate that kind of programming more than others (Haskel and functional programming is a good example). +But there are also "multi-paradigm" languages, meaning you can adapt your code to fit a certain paradigm or another (JavaScript and Python are good examples). + +Broadly speaking, the paradigms can be classified into two major types of programming paradigms: + +**Imperative** - an imperative program consists of commands for the computer to perform to change state e.g. C, Java, Python, etc. + +**Declarative** - focuses on what the program should accomplish without specifying all the details of how the program should achieve the result e.g. SQL, Lisp, Java etc. + + +### Popular Programming Paradigms +Now that we have introduced what programming paradigms are and are not, let's go through the most popular ones, explain their main characteristics, and compare them. + +- Imperative Programming +- Procedural Programming +- Object Oriented Programming +- Functional Programming +- Reactive Programming + +### 1. Imperative Programming +Imperative programming consists of sets of detailed instructions that are given to the computer to execute in a given order. It's called "imperative" because as programmers we dictate exactly what the computer has to do, in a very specific way. +Imperative programming focuses on describing how a program operates, step by step. +Say you want to bake a cake. Your imperative program to do this might look like this +``` +1- Pour flour in a bowl +2- Pour a couple eggs in the same bowl +3- Pour some milk in the same bowl +4- Mix the ingredients +5- Pour the mix in a mold +6- Cook for 35 minutes +7- Let chill +``` + +Using an actual code example, let's say we want to filter an array of numbers to only keep the elements bigger than 5. Our imperative code might look like this: + +```java +int nums[] = [1,4,3,6,7,8,9,2] +Arraylist result = new Arraylist<>(); + +for (int i = 0; i < nums.length; i++) { + if (nums[i] > 5) result.push_back(nums[i]) +} +``` + +### 2. Procedural Programming +Procedural programming is a derivation of imperative programming, adding to it the feature of functions (also known as "procedures" or "subroutines"). + +In procedural programming, the user is encouraged to subdivide the program execution into functions, as a way of improving modularity and organization. + +Following our cake example, procedural programming may look like this: +```java +function pourIngredients() { + - Pour flour in a bowl + - Pour a couple eggs in the same bowl + - Pour some milk in the same bowl +} + +function mixAndTransferToMold() { + - Mix the ingredients + - Pour the mix in a mold +} + +function cookAndLetChill() { + - Cook for 35 minutes + - Let chill +} + +pourIngredients() +mixAndTransferToMold() +cookAndLetChill() +``` +You can see that, thanks to the implementation of functions, we could just read the three function calls at the end of the file and get a good idea of what our program does. + +That simplification and abstraction is one of the benefits of procedural programming. But +within the functions, we still got same old imperative code. + +### 3. Object-Oriented Programming +One of the most popular programming paradigms is object-oriented programming (OOP). + +The core concept of OOP is to separate concerns into entities which are coded as objects. Each entity will group a given set of information (properties) and actions (methods) that can be performed by the entity. + +OOP makes heavy usage of classes (which are a way of creating new objects starting out from a blueprint or boilerplate that the programmer sets). Objects that are created from a class are called instances. + +Following our pseudo-code cooking example, now let's say in our bakery we have a main cook (called Frank) and an assistant cook (called Anthony) and each of them will have certain responsibilities in the baking process. If we used OOP, our program might look like this. + +```java +// Create the two classes corresponding to each entity +class Cook { + String name; + Cook (String name) { + this.name = name + } + + void mixAndBake() { + - Mix the ingredients + - Pour the mix in a mold + - Cook for 35 minutes + } +} + +class AssistantCook { + String name; + AssistantCook (String name) { + this.name = name + } + + void pourIngredients() { + - Pour flour in a bowl + - Pour a couple eggs in the same bowl + - Pour some milk in the same bowl + } + + void chillTheCake() { + - Let chill + } +} + +// Instantiate an object from each class +Cook Frank = new Cook('Frank') +AssistantCook Anthony = new AssistantCook('Anthony') + +// Call the corresponding methods from each instance +Anthony.pourIngredients() +Frank.mixAndBake() +Anthony.chillTheCake() +``` + +What's nice about OOP is that it facilitates the understanding of a program, by the clear separation of concerns and responsibilities. Languages like C++, Java, Python etc. support Object Oriented Programming. + +### 4.Declarative programming / Functional Programming +Declarative Programming is all about hiding away complexity and bringing programming languages closer to human language and thinking. It's the direct opposite of imperative programming in the sense that the programmer doesn't give instructions about how the computer should execute the task, but rather on what result is needed. + + The basic objective of this style of programming is to make code more concise, less complex, more predictable, and easier to test compared to the legacy style of coding. + +So far Java was supporting the imperative style of programming and object-oriented style of programming. The next big thing what java has been added is that Java has started supporting the functional style of programming with its Java 8 release. + +The functional style of programming is declarative programming. In the imperative style of coding, we define what to do a task and how to do it. Whereas, in the declarative style of coding, we only specify what to do. Let’s understand this with an example. Given a list of number let’s find out the sum of double of even numbers from the list using an imperative and declarative style of coding. + +```java +// Java program to find the sum +// using imperative style of coding +import java.util.Arrays; +import java.util.List; +public class TestImperative { + public static void main(String[] args) + { + List numbers + = Arrays.asList(11, 22, 33, 44, + 55, 66, 77, 88, + 99, 100); + + int result = 0; + for (Integer n : numbers) { + if (n % 2 == 0) { + result += n * 2; + } + } + System.out.println(result); + } +} +``` +The first issue with the above code is that we are mutating the variable result again and again. So mutability is one of the biggest issues in an imperative style of coding. The second issue with the imperative style is that we spend our effort telling not only what to do but also how to do the processing. Now let’s re-write above code in a declarative style. +```java +// Java program to find the sum +// using declarative style of coding +import java.util.Arrays; +import java.util.List; +public class GFG { + public static void main(String[] args) + { + List numbers + = Arrays.asList(11, 22, 33, 44, + 55, 66, 77, 88, + 99, 100); + + System.out.println( + numbers.stream() + .filter(number -> number % 2 == 0) + .mapToInt(e -> e * 2) + .sum()); + } +} +``` +The above example uses Java 8 Stream API. We will learn about Java 8 Streams, in later part of this LLD module. Language like Scala, Haskell, C# and Java also supports this paradigm as seen in above example. + +### 5. Reactive Programming +Reactive programming is a programming paradigm that focuses on constructing responsive and robust software applications that can handle asynchronous data streams and change propagation, allowing developers to create scalable and more easily maintainable applications that can adapt to a dynamic environment. In a reactive system, data flows through streams, which can be thought of as sequences of events over time. +Reactive programming revolves around the following principles - Data Streams, Observables, Observers and Operators. Java also supports this paradigm. + +----- +## Motivation - Object Oriented Programming + +**Problem Statement** Once upon a time in a software shop, two programmers were given the same spec and told to “build it”. The Project Manager forced the two coders - to compete. The problem statement is as follows: There will be shapes on GUI, a square, a circle and a triangle. When the user clicks the shape, it will rotate clockwise 360 degrees and play a .mp3 sound corresponding to that shape. + +**Coder 1** +Focusses on writing procedures. He wrote the `rotate()` and `playSound()` procedure in no time. + +```java +void rotate(int shapeNum){ + //code to rotate the shape about center +} + +void playSound(int shapeNum){ + if(shapeNum==1){ + //play the square sound + } + else if(shapeNum==2){ + //play the circle sound + } + ... +} +``` + + +**Coder 2** +Coder2 wrote a class for each of the three shapes. + +![](https://i.ibb.co/bNCXypG/Screenshot-2024-01-14-at-8-51-07-AM.png) + +**Another Requirement comes...** +Now manager, added another requirement : There will be an amoeba shape on the screen, with the others. When the user clicks on the amoeba, it will rotate like the others, and play a .hif sound file. + +**Coder1:** has to make changes in playSound method, rotate would still work! +```java +void playSound(shapeNum) { + // if the shape is not an amoeba, + // use shapeNum to lookup which + // AIF sound to play, and play it + // else + // play amoeba .hif sound + } +``` +It turned out not to be such a big deal, but it still made him queasy to touch previously-tested code. Of all people, he should know that no matter what the project manager says, the spec always changes. + +**Coder 2**: smiled, sipped his Coffee, and wrote one new class. Sometimes the thing he loved most about OO was that he didn’t have to touch code he’d already tested and delivered. “Flexibility, extensibility,...” he mused, reflecting on the benefits of Object Oriented Programming. + +```java +class Amoeba{ + void playSound(){ + ... + } + void rotate(){ + ... + } +} +``` + +But even that's not the best design! We’ve got duplicated code! The rotate method is in all four Shape things. We didn’t see the final design. Let us come back to it when we discuss inheritance in upcoming class. + +--- +## What is Object Oriented Programming? +Object-oriented programming (OOP) is a programming paradigm that uses objects to model real-world things and aims to implement state and behavior using objects. + +Object-Oriented Programming is based on implementing the state and behaviour concepts together. State and behaviour are combined into one new concept: an Object. An OO application can therefore produce some output by calling an Object, without needing to pass data structures. + +**Procedural vs OOPS World** +The focus of procedural programming is to break down a programming task into a collection of variables, data structures, and subroutines, whereas in object-oriented programming it is to break down a programming task into objects that expose behavior (methods) and data (members or attributes). The most important distinction is that while procedural programming uses procedures to operate on data structures, object-oriented programming bundles the two together, so an "object", which is an instance of a class, operates on its "own" data structure. +## Classes and Objects +Object-oriented programming bundles the the data + behaviour related to one entity inside a class. Lets understand classes and objects followed by pillars of OOPS - Abstraction, Encapsulation, Inheritance and Polymorphism in more detail. + +### Classes +Classes are the starting point of all objects, and we may consider them as the template for creating objects. A class would typically contain member fields, member methods, and a special constructor method. + +```java +public class Player { + //Data Members + String name; + int guess; + + + //Member Methods + void makeGuess() { + this.guess = (int)(Math.random() * 9.0) + 1; + System.out.println(this.name + " guessed " + this.guess); + } + ... +} +``` +### Objects +Objects are created from classes and are called instances of the class. We create objects from classes using their constructors. +```java + Player p1 = new Player(); + Player p2 = new Player(); + p1.name = "Prateek"; + p2.name = "Naman"; + p1.makeGuess(); //Prateek guessed 9 + p2.makeGuess(); //Naman guessed 6 +``` +Now, we’ve created different Player objects, all from a single class. These objects are created from Player class at runtime. This is the point of it all, to define the blueprint in one place, and then, to reuse it many times in many places. + +![](https://i.ibb.co/zGvKBWc/Screenshot-2024-01-14-at-12-37-04-PM.png) + +--- + +### Pillars of Object Oriented Programming +We have 4 pillars foundational concepts in OOPS also called Pillars of Object Oriented Programming, these are Abstraction, Encapsulation, Inheritance & Polymorphism. +#### Abstraction +Abstraction is hiding complexities of implementation and exposing simpler interfaces. +If we think about a typical computer, one can only see the external interface, which is most essential for interacting with it, while internal chips and circuits are hidden from the user. In OOP, abstraction means hiding the complex implementation details of a program, exposing only the API required to use the implementation. In Java, we achieve abstraction by using interfaces and abstract classes. + +#### Encapsulation +Encapsulation is hiding the state or internal representation of an object from the end-user and providing publicly accessible methods bound to the object for read-write access. This allows for hiding specific information and controlling access to internal implementation. + + Encapsulation is used to hide the values or state of a structured data object inside a class, preventing direct access to them by clients in a way that could expose hidden implementation details or violate state invariance maintained by the methods. + +```java +public class Player { + //Data Members + String name; + private int guess; + + //Provide a method to read guess + int getGuess(){ + return guess; + } + + //Member Methods + void makeGuess() { + //Randomly generated from 1-9 + this.guess = (int)(Math.random() * 9.0) + 1; + System.out.println(this.name + " guessed " + this.guess); + } + ... +} +``` +For example, private member fields like ```guess``` in the class are hidden from other classes, and they can be accessed using the member methods ```getGuess()```. This provides a way to read the value of Guess but prevents write access from other classes. + +#### Inheritance +Inheritance is the mechanism that allows one class to acquire all the properties from another class by inheriting the class. We call the inheriting class a child class and the inherited class as the superclass or parent class. + +In Java, we do this by extending the parent class. Thus, the child class gets all the properties from the parent. We will talk about inheritance in detail later. + +```java +public class User { + String username; + String email; +}; +``` +Student inherits all properties and methods from User. +```java +public class Student extends User{ + int marks; +} +``` + +#### Polymorphism +Polymorphism is the ability of an OOP language to process data differently depending on their types of inputs. In Java, this can be the same method name having different method signatures and performing different functions. +```java +public class TextFile { + //... + + public String read() { + return this.getContent() + .toString(); + } + + public String read(int limit) { + return this.getContent() + .toString() + .substring(0, limit); + } + + public String read(int start, int stop) { + return this.getContent() + .toString() + .substring(start, stop); + } +} + +``` +In this example, we can see that the method ```read()``` has three different forms with different functionalities. This type of polymorphism is static or compile-time polymorphism and is also called method overloading. There is also runtime or dynamic polymorphism, where the child class overrides the parent’s method.We will study runtime polymorphism in later classes. + +--- +## Advantages & Disadvantages of OOPS + +### Advantages +**1. Reusability**: Through classes and objects, and inheritance of common attributes and functions. + +**2. Security**: Hiding and protecting information through encapsulation. + +**3. Maintenance**: Easy to make changes without affecting existing objects much. + +**4. Inheritance**: Easy to import required functionality from libraries and customize them, thanks to inheritance. + +### Disadvantages + +1. Beforehand planning of entities is required that should be modeled as classes. + +2. OOPS programs are usually larger than those of other paradigms. + +3. Banana-gorilla problem - You wanted a banana but what you got was a gorilla holding the banana and the entire jungle. [Read More](https://dev.to/efpage/what-s-wrong-with-the-gorilla-2l4j#:~:text=Joe%20Armstrong%2C%20the%20principal%20inventor,and%20the%20entire%20jungle.%22.) + +----- End ---- + + + diff --git a/Non-DSA Notes/LLD1 Notes/02 OOPS - Access Modifiers & Constructors.md b/Non-DSA Notes/LLD1 Notes/02 OOPS - Access Modifiers & Constructors.md new file mode 100644 index 0000000..6c5f717 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/02 OOPS - Access Modifiers & Constructors.md @@ -0,0 +1,372 @@ +# OOPS II - Access Modifiers & Constructors +--- +In this tutorial, we will learn about +- Access Modifiers +- Getters & Setters +- Constructors +- Shallow & Deep Copy +- Java Memory Model - Objects & References +- Life & Death on Objects on Heap +- Project : Guess Game using OOPS Concepts + +## Access Modifiers +The access modifiers in Java specifies the accessibility or scope of a field, method, constructor, or class. We can change the access level of fields, constructors, methods, and class by applying the access modifier on it. Let's understand it through an example. +```java +public class Player { + //Data Members + String name; + private int guess; + public String handle; + + //Example of private method + // can't be called from outside the class + private void assignTeam(){ + ... + } + //example of a public method + //can be called from anywhere + public void setTeam(int teamId){ + ... + } +} +``` +In the above class, `guess` is private, `name` is default, `handle` is public. What does't it mean? Let understand the meanings of above acess modifiers. + +There are four types of access modifiers in Java: + +```public``` - The access level of a public modifier is everywhere. It can be accessed from within the class, outside the class, within the package and outside the package + +```protected``` - The access level of a protected modifier is within the package and outside the package through child class. If you do not make the child class, it cannot be accessed from outside the package. + +```private``` - The access level of a private modifier is only within the class. It cannot be accessed from outside the class. + +```default``` - The access level of a default modifier is only within the package. It cannot be accessed from outside the package. If you do not specify any access level, it will be the default. + +![](https://camo.githubusercontent.com/5d78da7eb91ae3ef8b046901c1ebd8d38cf4826a80b0b9d1cd4a002664a9301e/687474703a2f2f6e65742d696e666f726d6174696f6e732e636f6d2f6a6176612f6261736963732f696d672f6163636573732d6d6f6469666965722e706e67) + +**Getters & Setters** + +If you try to read or write a private data member, outside the class, you will get a compile error. In order to work with private data members, you might need to create special public methods called `getters()` and `setters()` in the class for specific data members as shown below. + +```java +public class Player { + //Data Members + String name; + private int guess; + public String handle; + + //Setter Method + public int setGuess(int guess){ + //Setters can have their validation logic before updating class member + if(guess>=0){ + this.guess = guess; + } + } + // Getter Method + public int getGuess(){ + return this.guess; + } +} +``` +The advantage of this approach is you can set the value if it satisfies the class specific validation logic. + + +### Constructors +A constructor is a special method that is called when an object is created. It is used to initialize the object. It is called automatically when the object is created. It can be used to set initial values for object attributes. + +Constructors are the gatekeepers of object-oriented design. Let us create a class for students: + +**Student.java** +```java +public class Student { + + private String name; + private String email; + private Integer age; + private String address; + private String batchName; + private Integer psp; + + public void changeBatch(String batchName) { + ... + } +} +``` +The above class can be used to create objects of type Student. This is done by using the new keyword: +```java +Student student = new Student(); +student.name = "Eklavya"; +``` +You can notice that we did not define a constructor for the Student class. This brings us to our first type of constructor + +### Default constructor +A default constructor is a constructor created by the compiler if we do not define any constructor(s) for a class. + +A default constructor is a constructor that either has no parameters, or if it has parameters, all the parameters have default values. If no user-defined constructor exists for a class and one is needed, the compiler implicitly declares a default parameterless constructor. + +A default constructor is also known as a no-argument constructor or a nullary constructor. All fields are left at their initial value of 0 (integer types), 0.0 (floating-point types), false (boolean type), or null (reference types) An example of a no-argument constructor is: + +```java +public class Student { + private String name; + private String email; + private Integer age; + private String address; + private String batchName; + private Integer psp; + + public Student() { + // no-argument constructor + } +} +``` +Notice a few things about the constructor which we just wrote. First, it's a method, but it has no return type. That's because a constructor implicitly returns the type of the object that it creates. Calling new Student() now will call the constructor above. Secondly, it takes no arguments. This particular kind of constructor is called a no-argument constructor. + +**Syntax of a constructor** +In Java, every class must have a constructor. Its structure looks similar to a method, but it has different purposes. A constructor has the following format ` ` + +Constructor declarations begin with access modifiers: They can be public, private, protected, or package access, based on other access modifiers. Unlike methods, a constructor can't be abstract, static, final, native, or synchronized. + +The declarator is the name of the class, followed by a parameter list. The parameter list is a comma-separated list of parameters enclosed in parentheses. The body is a block of code that defines the constructor's behavior. + +```Constructor Name (Parameter List)``` + +### Parameterised Constructor +Now, a real benefit of constructors is that they help us maintain encapsulation when injecting state into the object. The constructor above is a no-argument constructor and hence value have to be set after the instance is created. + +```java +Student student = new Student(); +student.name = "prateek"; +student.email = "prateek@gmail.in"; +... +``` +The above approach works but requires setting the values of all the fields after the instance is created. Also, we won't be able to validate or sanitize the values. We can add the validation and sanitization logic in the getters and setters but we wont be able to fail instance creation. Hence, we need to add a parameterised constructor. A parameterised constructor has the same syntax as the constructors before, the onl change is that it has a parameter list. + +```java +public class Student { + private String name; + private String email; + + public Student(String name, String email) { + this.name = name; + this.email = email; + } +} +``` + +```java +Student s1 = new Student("prateek", "prateek@gmail.in"); +Student s2 = new Student("rahul", "Rahul@gmail.in"); +``` +In Java, constructors differ from other methods in that: + +- Constructors never have an explicit return type. +- Constructors cannot be directly invoked (the keyword “new” invokes them). +- Constructors should not have non-access modifiers. + +### Copy constructor +A copy constructor is a member function that initializes an object using another object of the same class. A copy constructor has the following general function prototype: + +```java +class Student { + private String name; + private String email; + + public Student(String name, String email) { + this.name = name; + this.email = email; + } + + //Copy Constructor + public Student(Student student) { + this.name = student.name; + this.email = student.email; + } +} +``` + +### Shallow & Deep Copy +When we do a copy of some entity to create two or more than two entities such that changes in one entity are reflected in the other entities as well, then we can say we have done a shallow copy. In shallow copy, new memory allocation never happens for the other entities, and the only reference is copied to the other entities. The following example demonstrates the same. + +```java +public class Test { + public int n; + String name; + public int arr[]; + + Test(int n,String name){ + this.n = n; + this.name = name; + this.arr = new int[n]; + for(int i=0;i { + + // same as before + @Override + public int compareTo(Player otherPlayer) { + return Integer.compare(getRanking(), otherPlayer.getRanking()); + } +} +``` + +```java +public static void main(String[] args) { + List footballTeam = new ArrayList<>(); + Player player1 = new Player(59, "John", 20); + Player player2 = new Player(67, "Roger", 22); + Player player3 = new Player(45, "Steven", 24); + footballTeam.add(player1); + footballTeam.add(player2); + footballTeam.add(player3); + + System.out.println("Before Sorting : " + footballTeam); + Collections.sort(footballTeam); + System.out.println("After Sorting : " + footballTeam); +} +``` + +#### 2. Multiple Inheritances +Java classes support singular inheritance. However, by using interfaces, we’re also able to implement multiple inheritances. +For instance, in the example below, we notice that the Car class implements the Fly and Transform interfaces. By doing so, it inherits the methods fly and transform: + +```java +public interface Transform { + void transform(); +} + +public interface Fly { + void fly(); +} + +public class Car implements Fly, Transform { + + @Override + public void fly() { + System.out.println("I can Fly!!"); + } + + @Override + public void transform() { + System.out.println("I can Transform!!"); + } +} + +``` + +#### 3. Polymorphism +Polymorphism is the ability for an object to take different forms during runtime. To be more specific it’s the execution of the override method that is related to a specific object type at runtime. + +In Java, we can achieve polymorphism using interfaces. For example, the Shape interface can take different forms — it can be a Circle or a Square. + +Let’s start by defining the Shape interface: + +```java +public interface Shape { + String name(); +} +``` +Now let’s also create the Circle class: +```java +public class Circle implements Shape { + + @Override + public String name() { + return "Circle"; + } +} +``` +And also the Square class: +```java +public class Square implements Shape { + + @Override + public String name() { + return "Square"; + } +} +``` + +Finally, it’s time to see polymorphism in action using our Shape interface and its implementations. Let’s instantiate some Shape objects, add them to a List, and, finally, print their names in a loop: +```java +List shapes = new ArrayList<>(); +Shape circleShape = new Circle(); +Shape squareShape = new Square(); + +shapes.add(circleShape); +shapes.add(squareShape); + +for (Shape shape : shapes) { + System.out.println(shape.name()); +} +``` + +#### 4. Default Methods in Interfaces +Traditional interfaces in Java 7 and below don’t offer backward compatibility. +What this means is that if you have legacy code written in Java 7 or earlier, and you decide to add an abstract method to an existing interface, then all the classes that implement that interface must override the new abstract method. Otherwise, the code will break. + +Java 8 solved this problem by introducing the default method that is optional and can be implemented at the interface level. + +```java +public interface Shape { + default void draw() { + System.out.println("Drawing a Shape"); + } +} +``` + +#### Summmary +Interfaces are used in the following scenarios: +* It is used to achieve abstraction. +* Due to multiple inheritance, it can achieve loose coupling. +* Define a common behavior for unrelated classes. + + + +## Abstract Classes +There are many cases when implementing a contract where we want to postpone some parts of the implementation to be completed later. We can easily accomplish this in Java through abstract classes. +Before diving into when to use an abstract class, let’s look at their most relevant characteristics: + +- We define an abstract class with the abstract modifier preceding the class keyword +- An abstract class can be subclassed, but it can’t be instantiated +- If a class defines one or more abstract methods, then the class itself must be declared abstract +- An abstract class can declare both abstract and concrete methods +- A subclass derived from an abstract class must either implement all the base class’s abstract methods or be abstract itself +To better understand these concepts, we’ll create a simple example. + +### How to create an abstract class? + +Let us create an abstract class for a Person +You can create an abstract class by using the abstract keyword. +Similarly, you can create an abstract method by using the abstract keyword. + +```java +public abstract Person { + + public abstract String getName(); + public abstract String getEmail(); +} +``` + +Now let's create a class that extends the Person abstract class: + +```java +public class User extends Person { + private String name; + private String email; + + public User(String name, String email) { + this.name = name; + this.email = email; + } + + @Override + public String getName() { + return name; + } + + @Override + public String getEmail() { + return email; + } +} +``` + + +### Why use an abstract class? + +Now, let’s analyze a few typical scenarios where we should prefer abstract classes over interfaces and concrete classes: + +* It is used to achieve abstraction. +* It can have abstract methods and non-abstract methods. +* When you don't want to provide the implementation of a method, you can make it abstract. +* When you don't want to allow the instantiation of a class, you can make it abstract. +* We want to encapsulate some common functionality in one place (code reuse) that multiple, related subclasses will share +* We need to partially define an API that our subclasses can easily extend and refine +The subclasses need to inherit one or more common methods or fields with protected access modifiers +* Moreover, since the use of abstract classes implicitly deals with base types and subtypes, we’re also taking advantage of Polymorphism. + +**Sample Code** +Let’s have our base abstract class define the abstract API of a board game: + +```java +public abstract class BoardGame { + + //... field declarations, constructors + + public abstract void play(); + + //... concrete methods +} +``` +Then, we can create a subclass that implements the play method: +```java +public class Checkers extends BoardGame { + + public void play() { + //... implementation + } +} +``` + +## Reading List +* [Comparators & Comparable Interface](https://www.baeldung.com/java-comparator-comparable) +* [Interface Inheritance](https://www.baeldung.com/java-8-functional-interfaces) +* [Functional Interfaces in Java 8](https://www.baeldung.com/java-8-functional-interfaces) +* [Sample Java Collections Interface - Queue Interface](https://docs.oracle.com/javase/8/docs/api/java/util/Queue.html) +* [Duck Typing](https://realpython.com/lessons/duck-typing/#:~:text=Duck%20typing%20is%20a%20concept,a%20given%20method%20or%20attribute.) +* [OOP in Python](https://gist.github.com/kanmaytacker/e6ed49131970c67588fba9164fbc45d4) + diff --git a/Non-DSA Notes/LLD1 Notes/Advanced Java 01 - Generics.md b/Non-DSA Notes/LLD1 Notes/Advanced Java 01 - Generics.md new file mode 100644 index 0000000..bacf5a5 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Advanced Java 01 - Generics.md @@ -0,0 +1,252 @@ +# Adv Java 01 - Generics +---- +## Agenda +- Intro to Generics + - Generic Classes + - Generic Methods + - Wildcards in Generics + - Bounded Generics + - Generic Interfaces +- Additional Concepts + - Type Erasure + +## Introduction +Generics in Java provide a way to create classes, interfaces, and methods with a type parameter. This allows you to write code that can work with different types while providing compile-time type safety. In this beginner-friendly tutorial, we'll explore the basics of Java generics. + +Generics offer several benefits: + +1. Type Safety: Generics provide compile-time type checking, reducing the chances of runtime errors. + +2. Code Reusability: You can write code that works with different types without duplicating it. + +3. Elimination of Type Casting: Generics eliminate the need for explicit type casting, making the code cleaner. + +## Generic Classes +A generic class is a class that has one or more type parameters. Here's a simple example of a generic class. +```java +public class Box { + private T content; + + public void addContent(T content) { + this.content = content; + } + + public T getContent() { + return content; + } +} +``` +In this example, T is a type parameter. You can create instances of Box for different types: +```java +Box intBox = new Box<>(); +intBox.addContent(42); +System.out.println("Box Content: " + intBox.getContent()); // Output: 42 + +Box stringBox = new Box<>(); +stringBox.addContent("Hello, Generics!"); +System.out.println("Box Content: " + stringBox.getContent()); // Output: Hello, Generics! + +``` +## Generic Methods +You can also create generic methods within non-generic classes. Here's an example: +```java +public class Util { + public void printArray(E[] array) { + for (E element : array) { + System.out.print(element + " "); + } + System.out.println(); + } +} +``` +You can use this method with different types: +```java +Integer[] intArray = {1, 2, 3, 4, 5}; +String[] stringArray = {"apple", "banana", "orange"}; + +Util util = new Util(); +util.printArray(intArray); // Output: 1 2 3 4 5 +util.printArray(stringArray); // Output: apple banana orange +``` + +## Wildcard in Generics +The wildcard (?) is used to represent an unknown type. Let's see an example. +```java +public class Printer { + public static void printList(List list) { + for (Object item : list) { + System.out.print(item + " "); + } + System.out.println(); + } +} +``` +You can use this method with lists of different types: +```java +List intList = Arrays.asList(1, 2, 3); +List stringList = Arrays.asList("apple", "banana", "orange"); + +Printer.printList(intList); // Output: 1 2 3 +Printer.printList(stringList); // Output: apple banana orange +``` + +## Bounded Generics +Remember that type parameters can be bounded. Bounded means “restricted,” and we can restrict the types that a method accepts. + +For example, we can specify that a method accepts a type and all its subclasses (upper bound) or a type and all its superclasses (lower bound). + +Type bounds restrict the types that can be used as arguments in a generic class or method. You can use extends or super to set upper or lower bounds. + +```java +public class NumberBox { + private T content; + + public void addContent(T content) { + this.content = content; + } + + public T getContent() { + return content; + } +} +``` +In this example, T must be a subclass of Number. + + +To declare an upper-bounded type, we use the keyword extends after the type, followed by the upper bound that we want to use: +```java +public List fromArrayToList(T[] a) { + ... +} +``` + +We use the keyword extends here to mean that the type T extends the upper bound in case of a class or implements an upper bound in case of an interface. + +There are two types of wildcards: `? extends T` and `? super T`. The former is for upper-bounded wildcards, and the latter is for lower-bounded wildcards. + +Consider this example: +```java +public static void paintAllBuildings(List buildings) { + buildings.forEach(Building::paint); +} +``` + +If we imagine a subtype of Building, such as a House, we can’t use this method with a list of House, even though House is a subtype of Building. + +If we need to use this method with type Building and all its subtypes, the bounded wildcard can do the magic: +```java +public static void paintAllBuildings(List buildings) { + ... +} +``` +Now this method will work with type Building and all its subtypes. This is called an upper-bounded wildcard, where type Building is the upper bound. + +We can also specify wildcards with a lower bound, where the unknown type has to be a supertype of the specified type. Lower bounds can be specified using the super keyword followed by the specific type. For example, means unknown type that is a superclass of T (= T and all its parents). + + +## Generic Interfaces + +Interfaces can also be generic. For example: + +```java +public interface Pair { + K getKey(); + V getValue(); +} +``` +You can implement this interface with different types: + +```java +public class OrderedPair implements Pair { + private K key; + private V value; + + public OrderedPair(K key, V value) { + this.key = key; + this.value = value; + } + + @Override + public K getKey() { + return key; + } + + @Override + public V getValue() { + return value; + } +} +``` + +## Additional Concepts +### Type Erasure +Type erasure is a feature in Java generics where the type parameters used in generic code are removed (or erased) during compilation. This means that the generic type information is not available at runtime, and the generic types are replaced with their upper bounds or Object type. + +#### How Type Erasure Works: +##### 1. Compilation Phase: + +During the compilation phase, Java generics are type-checked to ensure type safety. +The compiler replaces all generic types with their upper bounds or with Object if no bound is specified. + +##### 2. Type Erasure: + +The compiler removes all generic type information and replaces it with casting or Object. +This process is known as type erasure, and it allows Java to maintain backward compatibility with non-generic code. +Example: +Consider the following generic class: + +```java +public class Box { + private T content; + + public void setContent(T content) { + this.content = content; + } + + public T getContent() { + return content; + } +} +``` +After compilation, the generic type T is replaced with Object: +```java +public class Box { + private Object content; + + public void setContent(Object content) { + this.content = content; + } + + public Object getContent() { + return content; + } +} +``` +#### Implications of Type Erasure: +1. Loss of Type Information at Runtime: + +Type information about generic types is not available at runtime due to type erasure. +For example, you can't determine the actual type parameter used for a generic class or method at runtime. + +2. Bridge Methods: + +When dealing with generic methods in classes or interfaces, the compiler generates bridge methods to maintain compatibility with pre-generics code. +3. Arrays and Generics: + +Due to type erasure, arrays of generic types are not allowed. You can't create an array of a generic type like T[] array = new T[5];. + +4. Casting and Unchecked Warnings: + +Type casts may be necessary when working with generic types, and this can lead to unchecked warnings. For example, when casting to a generic type, the compiler issues a warning because it can't verify the type at runtime. +```java +Box integerBox = new Box<>(); +integerBox.setContent(42); + +// Warning: Unchecked cast +int value = (Integer) integerBox.getContent(); +``` +#### Summary +Type erasure is a mechanism in Java generics that removes generic type information during compilation to maintain compatibility with non-generic code. While this approach allows for seamless integration with existing code, it also means that certain generic type information is not available at runtime. Developers need to be aware of the implications of type erasure, such as potential unchecked warnings and limitations on working with arrays of generic types. +-- End -- + + diff --git a/Non-DSA Notes/LLD1 Notes/Advanced Java 02 - Collections.md b/Non-DSA Notes/LLD1 Notes/Advanced Java 02 - Collections.md new file mode 100644 index 0000000..18b3d5e --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Advanced Java 02 - Collections.md @@ -0,0 +1,287 @@ +# Adv Java 02 - Collections +---- +## Agenda +- Intro to Collections + - Common Collection Interfaces + - List Interface + - Queue Interface + - Set Interface + - Map Interface + +- Intro to Iterators + - Using Iterators + - Iterator Methods + +- Additonal Concepts + - Using Custom Object as Key with Hashmap etc + +## Introduction +Java Collections Framework provides a set of interfaces and classes to store and manipulate groups of objects. Collections make it easier to work with groups of objects, such as lists, sets, and maps. In this beginner-friendly tutorial, we'll explore the basics of Java Collections and how to use iterators to traverse through them. + +### 1. Introduction to Java Collections +Java Collections provide a unified architecture for representing and manipulating groups of objects. The Collections Framework includes interfaces, implementations, and algorithms that simplify the handling of groups of objects. + +[Collection PlayList - Video Tutorial](https://drive.google.com/drive/folders/1lLcfZzmKSa0bq_1--OOya0Xk7-y5A9SN?usp=drive_link) + +### 2. Common Collection Interfaces +There are several core interfaces in the Collections Framework: + +![](https://miro.medium.com/v2/resize:fit:822/1*qgcrVwo8qzF4muOQ-kKB8A.jpeg) + +**Collection:** The root interface for all collections. It represents a group of objects, and its subinterfaces include List, Set, and Queue. + +**List:** An ordered collection that allows duplicate elements. Implementations include ArrayList, LinkedList, and Vector. + +**Queue:**: The Queue interface in Java is part of the Java Collections Framework and extends the Collection interface. Queues typically, but do not necessarily, order elements in a FIFO (first-in-first-out) manner. Among the exceptions are priority queues, which order elements according to a supplied comparator, or the elements' natural ordering. Implementations include ArrayDeque, LinkedList, PriorityQueue etc. + +**Set:** An unordered collection that does not allow duplicate elements. Implementations include HashSet, LinkedHashSet, and TreeSet. + +**Map:** A collection that maps keys to values. Implementations include HashMap, LinkedHashMap, TreeMap, and Hashtable. + +### Example-1 List Interface and ArrayList +The List interface extends the Collection interface and represents an ordered collection of elements. One of the common implementations is ArrayList. Let's see a simple example: + +```java +import java.util.ArrayList; +import java.util.List; + +public class ListExample { + public static void main(String[] args) { + List myList = new ArrayList<>(); + myList.add("Java"); + myList.add("Python"); + myList.add("C++"); + + System.out.println("List elements: " + myList); + } +} +``` + +### Example-2 Set Interface and HashSet +The Set interface represents an unordered collection of unique elements. One of the common implementations is HashSet. Here's a simple example: + +```java +import java.util.HashSet; +import java.util.Set; + +public class SetExample { + public static void main(String[] args) { + Set mySet = new HashSet<>(); + mySet.add("Apple"); + mySet.add("Banana"); + mySet.add("Orange"); + + System.out.println("Set elements: " + mySet); + } +} +``` + +### Example-3 Map Interface and HashMap +The Map interface represents a collection of key-value pairs. One of the common implementations is HashMap. Let's see an example: + +```java +import java.util.HashMap; +import java.util.Map; + +public class MapExample { + public static void main(String[] args) { + Map myMap = new HashMap<>(); + myMap.put("Java", 20); + myMap.put("Python", 15); + myMap.put("C++", 10); + + System.out.println("Map elements: " + myMap); + } +} +``` +## Introduction to Iterators +An iterator is an interface that provides a way to access elements of a collection one at a time. The Iterator interface includes methods for iterating over a collection and retrieving elements. + +Let's see how to use iterators with a simple example using a List: +### Example - 1 List Iterator +```java +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + +public class IteratorExample { + public static void main(String[] args) { + List myList = new ArrayList<>(); + myList.add("Java"); + myList.add("Python"); + myList.add("C++"); + + // Getting an iterator + Iterator iterator = myList.iterator(); + + // Iterating through the elements + while (iterator.hasNext()) { + String element = iterator.next(); + System.out.println("Element: " + element); + } + } +} +``` + +### Example-2 Iterating Over Priority Queue +```java +public class PriorityQueueIteratorExample { + public static void main(String[] args) { + // Creating a PriorityQueue with Integer elements + PriorityQueue priorityQueue = new PriorityQueue<>(); + + // Adding elements to the PriorityQueue + priorityQueue.offer(30); + priorityQueue.offer(10); + priorityQueue.offer(20); + + // Using Iterator to iterate over elements in PriorityQueue + System.out.println("Elements in PriorityQueue using Iterator:"); + + Iterator iterator = priorityQueue.iterator(); + while (iterator.hasNext()) { + System.out.println(iterator.next()); + } + } +} +``` +n this example, we create a PriorityQueue of integers and add three elements to it. We then use an Iterator to iterate over the elements and print them. + +Keep in mind that when using a PriorityQueue, the order of retrieval is based on the natural order (if the elements are comparable) or a provided comparator. The element with the highest priority comes out first. + +It's important to note that the iterator does not guarantee any specific order when iterating over the elements of a PriorityQueue. + +#### Iterator Methods +The Iterator interface provides several methods, including: + +- hasNext(): Returns true if the iteration has more elements. +- next(): Returns the next element in the iteration. +- remove(): Removes the last element returned by next() from the underlying collection (optional operation). + +Here's an example demonstrating the use of these methods: + +```java +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + +public class IteratorMethodsExample { + public static void main(String[] args) { + List numbers = new ArrayList<>(); + numbers.add(1); + numbers.add(2); + numbers.add(3); + + Iterator iterator = numbers.iterator(); + + // Using hasNext() and next() methods + while (iterator.hasNext()) { + Integer number = iterator.next(); + System.out.println("Number: " + number); + + // Using remove() method (optional operation) + iterator.remove(); + } + + System.out.println("Updated List: " + numbers); + } +} +``` +## Additional Concepts + +### Hashmap with Custom Objects +Using a HashMap with custom objects in Java involves a few steps. Let's go through the process step by step. Suppose you have a custom object called Person with attributes like id, name, and age. + +- Step 1: Create the Custom Object +```java +public class Person { + private int id; + private String name; + private int age; + + public Person(int id, String name, int age) { + this.id = id; + this.name = name; + this.age = age; + } + + // Getters and setters (not shown for brevity) + + @Override + public String toString() { + return "Person{id=" + id + ", name='" + name + "', age=" + age + '}'; + } +} +``` +- Step 2: Use Person as a Key in HashMap +Now, you can use Person objects as keys in a HashMap. For example: + +```java +import java.util.HashMap; +import java.util.Map; + +public class HashMapExample { + public static void main(String[] args) { + // Create a HashMap with Person objects as keys + Map personMap = new HashMap<>(); + + // Add entries + Person person1 = new Person(1, "Alice", 25); + Person person2 = new Person(2, "Bob", 30); + + personMap.put(person1, "Employee"); + personMap.put(person2, "Manager"); + + // Retrieve values using Person objects as keys + Person keyToLookup = new Person(1, "Alice", 25); + String position = personMap.get(keyToLookup); + + System.out.println("Position for " + keyToLookup + ": " + position); + } +} +``` +In this example, Person objects are used as keys, and the associated values represent their positions. Note that for keys to work correctly in a HashMap, the custom class (Person in this case) should override the hashCode() and equals() methods. + +- Step 3: Override hashCode() and equals() + +```java +public class Person { + // ... existing code + + @Override + public int hashCode() { + return Objects.hash(id, name, age); + } + + @Override + public boolean equals(Object obj) { + if (this == obj) return true; + if (obj == null || getClass() != obj.getClass()) return false; + + Person person = (Person) obj; + + return id == person.id && age == person.age && Objects.equals(name, person.name); + } +} +``` +By overriding these methods, you ensure that the HashMap correctly handles collisions and identifies when two Person objects are considered equal. + +**Important Considerations** + +**Immutability:** +It's often a good practice to make the custom objects used as keys immutable. This helps in maintaining the integrity of the HashMap because the keys should not be modified after being used. + +**Consistent hashCode():** + +Ensure that the hashCode() method returns the same value for two objects that are considered equal according to the equals() method. This ensures proper functioning of the HashMap. + +**Performance:** +Consider the performance implications when using complex objects as keys. If the hashCode() and equals() methods are computationally expensive, it might affect the performance of the HashMap. +By following these steps and considerations, you can effectively use custom objects as keys in a HashMap in Java. + +### Summary +Java Collections and Iterators are fundamental concepts for handling groups of objects efficiently. Understanding the different collection interfaces, implementing classes, and utilizing iterators will empower you to work with collections effectively in your Java applications. Practice and explore the various methods available in the Collections Framework to enhance your programming skills. + +-- End -- + + diff --git a/Non-DSA Notes/LLD1 Notes/Advanced Java 03- Lambdas & Streams.md b/Non-DSA Notes/LLD1 Notes/Advanced Java 03- Lambdas & Streams.md new file mode 100644 index 0000000..d9da89c --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Advanced Java 03- Lambdas & Streams.md @@ -0,0 +1,587 @@ +# Adv Java 03 - Lambdas & Streams +---- +## Agenda +- Key Terms + - Lambdas + - Streams + - Functional Interfaces + +- Lambdas Expressions + - Motivation + - Examples + - Runnable + - Addition + - Passing Lambdas as Arguments + - Lambdas in Collections + - Sorting Example + +- Streams + - Basics + - Creation + - Intermediate Operations + - Filtering, Mapping, Sorting etc. + - Terminal Operations + - Iterating, Reducing, Collecting etc + - Examples + - Advantages of Streams + - Sequential Streams & Parallel Streams + + +- Additional Reading + - Collect Method() + - Collectors Interface + +--- +## Key Terms +**Lambdas** +```A lambda expression is a block of code that gets passed around, like an anonymous method. It is a way to pass behavior as an argument to a method invocation and to define a method without a name.``` + +**Streams** +```A stream is a sequence of data. It is a way to write code that is more declarative and less imperative to process collections of objects.``` + +**Functional Interfaces** +```A functional interface is an interface that contains one and only one abstract method. It is a way to define a contract for behavior as an argument to a method invocation``` +--- + + +## Lambda Expressions +Lambda expressions, also known as anonymous functions, provide a way to create concise and expressive code by allowing the definition of a function in a more compact form. + +The basic syntax of a lambda expression consists of the parameter list, the arrow (->), and the body. The body can be either an expression or a block of statements. +```java +(parameters) -> expression +(parameters) -> { statements } +``` + +**Parameter List:** This represents the parameters passed to the lambda expression. It can be empty or contain one or more parameters enclosed in parentheses. If there's only one parameter and its type is inferred, you can omit the parentheses. + +**Arrow Operator (->):** This separates the parameter list from the body of the lambda expression. + +**Lambda Body:** This contains the code that makes up the implementation of the abstract method of the functional interface. The body can be a single expression or a block of code enclosed in curly braces. + +Lambda expressions are most commonly used with functional interfaces, which are interfaces containing only one abstract method. Java 8 introduced the @FunctionalInterface annotation to mark such interfaces. + +``` +@FunctionalInterface +interface MyFunctionalInterface { + void myMethod(); +} +``` + +### Examples +Let's start with some simple examples to illustrate the basic syntax: + +#### 1. Hello World Runnable +To understand, the motivation behind lambdas, remember how we create a thread in Java. We create a class that implements the Runnable interface and override the run() method. Then we create a new instance of the class and pass it to the Thread constructor. +```java +// Traditional approach +Runnable traditionalRunnable = new Runnable() { + @Override + public void run() { + System.out.println("Hello, World!"); + } +}; +``` +This is a lot of code to write just to print a simple message. Here, the Runnable interface is a functional interface. It contains only one abstract method, run(). An interface with a single abstract method (SAM) is called a functional interface. Such interfaces can be implemented using lambdas. + +Using Lambda Expression. +```java +// Lambda expression +Runnable lambdaRunnable = () -> System.out.println("Hello, World!"); +``` + +#### 2. Add Numbers +```java +// Traditional approach +MathOperation traditionalAddition = new MathOperation() { + @Override + public int operate(int a, int b) { + return a + b; + } +}; +``` + +Using Lambda expression +```java +MathOperation lambdaAddition = (a, b) -> a + b; +``` +#### 3. Lambda Expressions with Parameters +Lambda expressions can take parameters, making them versatile for various use cases. +```java +NumberChecker traditionalChecker = new NumberChecker() { + @Override + public boolean check(int number) { + return number % 2 == 0; + } +}; +``` + + +Using Lambda expression +```java +NumberChecker lambdaChecker = number -> number % 2 == 0; +``` + + +### 4. Lambda Expressions in Collections +Lambda expressions are commonly used with collections for concise iteration and processing. + +Filtering a List Example (Traditonal Approach) +```java +List fruits = Arrays.asList("Apple", "Banana", "Orange", "Mango"); + +// Traditional approach +List filteredTraditional = new ArrayList<>(); +for (String fruit : fruits) { + if (fruit.startsWith("A")) { + filteredTraditional.add(fruit); + } +} +``` + +Using Lambda expression & Java Stream API +```java +List filteredLambda = fruits.stream() + .filter(fruit -> fruit.startsWith("A")) + .collect(Collectors.toList()); +``` + +### 4. Sorting Example +Method references provide a shorthand notation for lambda expressions, making the code even more concise. +```java +List names = Arrays.asList("Alice", "Bob", "Charlie", "David"); + +// Lambda expression for sorting +Collections.sort(names, (a, b) -> a.compareTo(b)); + +// Method reference for sorting +Collections.sort(names, String::compareTo); +``` + +## Java 8 Streams +### Streams +A stream in Java is simply a wrapper around a data source, allowing us to perform bulk operations on the data in a convenient way. The Java Stream API, introduced in Java 8, is a powerful abstraction for processing sequences of elements, such as collections or arrays, in a functional and declarative way. + + Streams are designed to be used in a chain of operations, allowing you to create complex data processing pipelines. + +In this tutorial we will learn about Sequential Streams, Parallel Streams and Collect() Method of stream. + +## 1. Creating Streams +**Example 1: Creating a Stream from a Collection** +```java +List fruits = Arrays.asList("Apple", "Banana", "Orange", "Mango"); + +// Creating a stream from a collection +Stream fruitStream = fruits.stream(); +``` + +**Example 2: Creating a Stream from an Array** +```java +String[] cities = {"New York", "London", "Tokyo", "Paris"}; + +// Creating a stream from an array +Stream cityStream = Arrays.stream(cities); +``` + + +**Example 3: Creating a Stream of Integers** +```java +IntStream intStream = IntStream.rangeClosed(1, 5); +// Creating a stream of integers +intStream.forEach(System.out::println); // Output: 1 2 3 4 5 +``` +### 2. Intermediate Operations +Intermediate operations are operations that transform a stream into another stream. They are lazy, meaning they don't execute until a terminal operation is invoked. There are two types of operations that you can perform on a stream: +Some examples of intermediate operations are filter(), map(), sorted(), distinct(), limit(), and skip(). + +Filtering and Mapping Example: +```java +List fruits = Arrays.asList("Apple", "Banana", "Orange", "Mango"); + +// Filtering fruits starting with 'A' and converting to uppercase +List result = fruits.stream() + .filter(fruit -> fruit.startsWith("A")) + .map(String::toUpperCase) + .collect(Collectors.toList()); + +System.out.println(result); // Output: [APPLE] +``` +**Filtering** +The filter() method is used to filter elements from a stream based on a predicate. It takes a predicate as an argument and returns a stream that contains only those elements that match the predicate. For example, let's filter out the even numbers from a stream of numbers: + +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +Stream evenNumbers = stream.filter(number -> number % 2 == 0); +``` +Here, we have created a stream of numbers and filtered out the even numbers from the stream. The `filter()` method takes a predicate as an argument. A predicate is a functional interface that takes an argument and returns a boolean result. It is defined in the java.util.function package. It contains the test() method that takes an argument of type T and returns a boolean result. For example, let's create a +predicate that checks if a number is even: +```java +Predicate isEven = number -> number % 2 == 0; +``` +Here, we have created a predicate called isEven that checks if a number is even. We can use this predicate to filter out the even numbers from a stream of numbers as follows: +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +Stream evenNumbers = stream.filter(isEven); +``` +**Mapping** +The map() method is used to transform elements in a stream. It takes a function as an argument and returns a stream that contains the results of applying the function to each element in the stream. For +example, let's convert a stream of numbers to a stream of their squares: +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +Stream squares = stream.map(number -> number * number); +``` +Here, we have created a stream of numbers and converted it to a stream of their squares. The map() method takes a function as an argument. A function is a functional interface that takes an argument and returns a result. It is defined in the java.util.function package. It contains the apply() method that takes an argument of type T and returns a result of type R. For example, let's create a function that converts +a number to its square: +```java +Function square = number -> number * number; +``` +Here, we have created a function called square that converts a number to its square. We can use this function to convert a stream of numbers to a stream of their squares as follows: +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +Stream squares = stream.map(square); +``` + +**Sorting** +The sorted() method is used to sort elements in a stream. It takes a comparator as an argument and returns a stream that contains the elements sorted according to the comparator. For example, let's sort a stream of numbers in ascending order: +```java +Stream stream = Stream.of(5, 3, 1, 4, 2); +Stream sortedNumbers = stream.sorted(); +``` +Here, we have created a stream of numbers and sorted it in ascending order. The sorted() method takes a comparator as an argument. A comparator is a functional interface that compares two objects of the same type. It is defined in the `java.util.function package`. It contains the compare() method that takes two arguments of type T and returns an integer result. For example, let's create a comparator that +compares two numbers: +```java +Comparator comparator = (number1, number2) -> number1 - number2; +``` +Here, we have created a comparator called comparator that compares two numbers. We can use this comparator to sort a stream of numbers in ascending order as follows: +```java +Stream stream = Stream.of(5, 3, 1, 4, 2); +Stream sortedNumbers = stream.sorted(comparator); +``` + +### 3. Terminal operations +Terminal operations trigger the processing of elements and produce a result or a side effect. They are the final step in a stream pipeline. They are eager, which means that they are executed immediately. Some examples of terminal operations are +forEach(), count(), collect(), reduce(), min(), max(), anyMatch(), allMatch(), and +noneMatch(). + +**Iterating** +The forEach() method is used to iterate over the elements in a stream. It takes a consumer as an argument and invokes the consumer for each element in the stream. For example, let's iterate over a stream of numbers and print each number: +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +stream.forEach(number -> System.out.println(number)) +``` + +**Reducing** +The reduce() method is used to reduce the elements in a stream to a single value. It takes an identity value and a binary operator as arguments and returns the result of applying the binary operator to the identity value and the elements in the stream. For example, let's find the sum of all the numbers in a stream: +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +int sum = stream.reduce(0, (number1, number2) -> number1 + number2); +``` +Here, we have created a stream of numbers and found the sum of all the numbers in the stream. The reduce() method takes an identity value and a binary operator as arguments. A binary operator is a functional interface that takes two arguments of the same type and returns a result of the same type. It is defined in the java.util.function package. It contains the apply() method that takes two arguments of type T and returns a result of type T. For example, let's create a binary operator that adds two numbers: +```java +BinaryOperator add = (number1, number2) -> number1 + number2; +``` +Here, we have created a binary operator called add that adds two numbers. We can use this binary operator to find the sum of all the numbers in a stream as follows: +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +int sum = stream.reduce(0, add); +``` + +**Collecting** +The collect() method is used to collect the elements in a stream into a collection. It takes a collector as an argument and returns the result of applying the collector to the elements in the stream. For example,let's collect the elements in a stream into a list: +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +List numbers = stream.collect(Collectors.toList()); +``` +You can now use the toList() method on streams to collect the elements in a stream into a list. +``` +Stream stream = Stream.of(1, 2, 3, 4, 5); +List numbers = stream.toList(); +``` +Similarly, you can use the toSet() method on streams to collect the elements in a stream into a set. +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +Set numbers = stream.toSet(); +``` +**Finding the first element** +The findFirst() method is used to find the first element in a stream. It returns an Optional that contains the first element in the stream. For example, let's find the first even number in a stream of numbers: +```java +Stream stream = Stream.of(1, 2, 3, 4, 5); +Optional firstEvenNumber = stream.filter(number -> number % 2 == +0).findFirst(); +``` + +## More Examples + +**Example1: Collecting into a List** +```java +List fruits = Arrays.asList("Apple", "Banana", "Orange", "Mango"); + +// Collecting filtered fruits into a new list +List result = fruits.stream() + .filter(fruit -> fruit.length() > 5) + .collect(Collectors.toList()); + +System.out.println(result); // Output: [Banana, Orange] +``` +**Example2: Counting Elements** +```java +List fruits = Arrays.asList("Apple", "Banana", "Orange", "Mango"); + +// Counting the number of fruits +long count = fruits.stream() + .filter(fruit -> fruit.length() > 5) + .count(); + +System.out.println("Number of fruits: " + count); // Output: Number of fruits: 2 +``` + +**Example3: Joining Strings** +```java +List words = Arrays.asList("Hello", " ", "Stream", " ", "API"); + +// Concatenating strings +String result = words.stream() + .collect(Collectors.joining()); + +System.out.println("Concatenated String: " + result); // Output: Concatenated String: Hello +``` + +### Advantages of Streams +The motivation for introducing streams in Java was to provide a more concise, readable, and expressive way to process sequences of data elements, such as collections or arrays. Streams were designed to address several challenges and limitations that traditional imperative programming with loops and conditionals +presented: +**Readability and Expressiveness:** Traditional loops often involve low-level details like index manipulation and explicit iteration, which can make the code harder to read and understand. Streams provide a higher-level, declarative approach that focuses on expressing the operations you want to perform on the data rather than the mechanics of how to perform them. + +**Code Reduction:** Streams allow you to perform complex operations on data elements in a more concise and compact manner compared to traditional loops. This leads to fewer lines of code and improved code maintainability. + +**Parallelism:** Streams can be easily converted to parallel streams, allowing you to take advantage of multi-core processors and perform operations concurrently. This can lead to improved performance for certain types of data processing tasks. + +**Separation of Concerns:** With traditional loops, you often mix the concerns of iterating over elements, filtering, mapping, and aggregation within a single loop. Streams encourage a separation of concerns by providing distinct operations that can be chained together in a more modular way. + +**Lazy Evaluation:** Streams introduce lazy evaluation, which means that operations are only performed when the results are actually needed. This can lead to improved performance by avoiding unnecessary computations. + +**Functional Programming:** Streams embrace functional programming concepts by providing +operations that transform data in a functional and immutable manner. This makes it easier to reason about the behavior of your code and reduces the potential for side effects. + +**Data Abstraction:** Streams abstract away the underlying data source, allowing you to work with different data sources (collections, arrays, I/O channels) in a consistent way. This makes your code more flexible and reusable. + +In summary, the motivation behind introducing streams in Java was to provide a modern, expressive, and functional programming paradigm for processing data elements, enabling developers to write more readable, maintainable, and efficient code. Streams simplify complex data manipulations, encourage separation of concerns, and support parallel processing, contributing to improved code quality and developer productivity. + + +## Multithreading using Java Streams +### Sequential Streams +By default, any stream operation in Java is processed sequentially, unless explicitly specified as parallel. + +Sequential streams use a single thread to process the pipeline: +```java +List listOfNumbers = Arrays.asList(1, 2, 3, 4); +listOfNumbers.stream().forEach(number -> + System.out.println(number + " " + Thread.currentThread().getName()) +); +``` +The output of this sequential stream is predictable. The list elements will always be printed in an ordered sequence: + +``` +1 main +2 main +3 main +4 main +``` + +### Parallel Streams +Stream API also simplifies multithreading by providing the `parallelStream()` method that runs operations over stream’s elements in parallel mode. Any stream in Java can easily be transformed from sequential to parallel. + +We can achieve this by adding the parallel method to a sequential stream or by creating a stream using the parallelStream method of a collection: + +The code below allows to run method doWork() in parallel for every element of the stream: +```java +list.parallelStream().forEach(element -> doWork(element)); +``` +For the above sequential example, the code will looks like this - + +```java +List listOfNumbers = Arrays.asList(1, 2, 3, 4); +listOfNumbers.parallelStream().forEach(number -> + System.out.println(number + " " + Thread.currentThread().getName()) +); +``` +Parallel streams enable us to execute code in parallel on separate cores. The final result is the combination of each individual outcome. + +However, the order of execution is out of our control. It may change every time we run the program: +``` +4 ForkJoinPool.commonPool-worker-3 +2 ForkJoinPool.commonPool-worker-5 +1 ForkJoinPool.commonPool-worker-7 +3 main +``` +Parallel streams make use of the fork-join framework and its common pool of worker threads. Parallel processing may be beneficial to fully utilize multiple cores. But we also need to consider the overhead of managing multiple threads, memory locality, splitting the source and merging the results. +Refer this [Article](https://www.baeldung.com/java-when-to-use-parallel-stream) to learn more about when to use parallel streams. + +## Additonal Topics +### Collect() Method +A stream represents a sequence of elements and supports different kinds of operations that lead to the desired result. The source of a stream is usually a Collection or an Array, from which data is streamed from. + +Streams differ from collections in several ways; most notably in that the streams are not a data structure that stores elements. They're functional in nature, and it's worth noting that operations on a stream produce a result and typically return another stream, but do not modify its source. + +To "solidify" the changes, you **collect** the elements of a stream back into a Collection. + +The `stream.collect()` method is used to perform a mutable reduction operation on the elements of a stream. It returns a new mutable object containing the results of the reduction operation. + +This method can be used to perform several different types of reduction operations, such as: + +- Computing the sum of numeric values in a stream. +- Finding the minimum or maximum value in a stream. +- Constructing a new String by concatenating the contents of a stream. +- Collecting elements into a new List or Set. + +```java +public class CollectExample { + public static void main(String[] args) { + Integer[] intArray = {1, 2, 3, 4, 5}; + + // Creating a List from an array of elements + // using Arrays.asList() method + List list = Arrays.asList(intArray); + + // Demo1: Collecting all elements of the list into a new + // list using collect() method + List evenNumbersList = list.stream() + .filter(i -> i%2 == 0) + .collect(toList()); + System.out.println(evenNumbersList); + + // Demo2: finding the sum of all the values + // in the stream + Integer sum = list.stream() + .collect(summingInt(i -> i)); + System.out.println(sum); + + // Demo3: finding the maximum of all the values + // in the stream + Integer max = list.stream() + .collect(maxBy(Integer::compare)).get(); + System.out.println(max); + + // Demo4: finding the minimum of all the values + // in the stream + Integer min = list.stream() + .collect(minBy(Integer::compare)).get(); + System.out.println(min); + + // Demo5: counting the values in the stream + Long count = list.stream() + .collect(counting()); + System.out.println(count); + } +} +``` + +In Demo1: We use the stream() method to get a stream from the list. We filter the even elements and collect them into a new list using the collect() method. + +In Demo2: We use the collect() method summingInt(ToIntFunction) as an argument. The summingInt() method returns a collector that sums the integer values extracted from the stream elements by applying an int producing mapping function to each element. + +In Demo 3: We use the collect() method with maxBy(Comparator) as an argument. The maxBy() accepts a Comparator and returns a collector that extracts the maximum element from the stream according to the given Comparator. + +Lets learn more about Collectors. + + +### Collectors Class + +Collectors represent implementations of the Collector interface, which implements various useful reduction operations, such as accumulating elements into collections, summarizing elements based on a specific parameter, etc. + +All predefined implementations can be found within the [Collectors](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html) class. + + +Within the Collectors class itself, we find an abundance of unique methods that deliver on the different needs of a user. One such group is made of summing methods - `summingInt()`, `summingDouble()` and `summingLong()`. + + + +Let's start off with a basic example with a List of Integers: + +```java +List numbers = Arrays.asList(1, 2, 3, 4, 5); +Integer sum = numbers.stream().collect(Collectors.summingInt(Integer::intValue)); +System.out.println("Sum: " + sum); +``` +We apply the .stream() method to create a stream of Integer instances, after which we use the previously discussed `.collect()` method to collect the elements using `summingInt()`. The method itself, again, accepts the `ToIntFunction`, which can be used to reduce instances to an integer that can be summed. + +Since we're using Integers already, we can simply pass in a method reference denoting their `intValue`, as no further reduction is needed. + +More often than not - you'll be working with lists of custom objects and would like to sum some of their fields. For instance, we can sum the quantities of each product in the productList, denoting the total inventory we have. + +Let us try to understand one of these methods using a custom class example. +``` java +public class Product { + private String name; + private Integer quantity; + private Double price; + private Long productNumber; + + // Constructor, getters and setters + ... +} +... +List products = Arrays.asList( + new Product("Milk", 37, 3.60, 12345600L), + new Product("Carton of Eggs", 50, 1.20, 12378300L), + new Product("Olive oil", 28, 37.0, 13412300L), + new Product("Peanut butter", 33, 4.19, 15121200L), + new Product("Bag of rice", 26, 1.70, 21401265L) +); + +``` + +In such a case, the we can use a method reference, such as `Product::getQuantity` as our `ToIntFunction`, to reduce the objects into a single integer each, and then sum these integers: + +```java +Integer sumOfQuantities = products.stream().collect(Collectors.summingInt(Product::getQuantity)); +System.out.println("Total number of products: " + sumOfQuantities); +``` +This results in: + +``` +Total number of products: 174 +``` + +You can also very easily implement your own collector and use it instead of the predefined ones, though - you can get pretty far with the built-in collectors, as they cover the vast majority of cases in which you might want to use them. + +The following are examples of using the predefined collectors to perform common mutable reduction tasks: +```java + + // Accumulate names into a List + List list = people.stream().map(Person::getName).collect(Collectors.toList()); + + // Accumulate names into a TreeSet + Set set = people.stream().map(Person::getName).collect(Collectors.toCollection(TreeSet::new)); + + // Convert elements to strings and concatenate them, separated by commas + String joined = things.stream() + .map(Object::toString) + .collect(Collectors.joining(", ")); + + // Compute sum of salaries of employee + int total = employees.stream() + .collect(Collectors.summingInt(Employee::getSalary))); + + // Group employees by department + Map> byDept + = employees.stream() + .collect(Collectors.groupingBy(Employee::getDepartment)); + + // Compute sum of salaries by department + Map totalByDept + = employees.stream() + .collect(Collectors.groupingBy(Employee::getDepartment, + Collectors.summingInt(Employee::getSalary))); + + // Partition students into passing and failing + Map> passingFailing = + students.stream() + .collect(Collectors.partitioningBy(s -> s.getGrade() >= PASS_THRESHOLD)); + +``` +You can look at the offical documentation for more details on these methods. +https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html + +--- End --- + + diff --git a/Non-DSA Notes/LLD1 Notes/Advanced Java 04 - Exception Handling.md b/Non-DSA Notes/LLD1 Notes/Advanced Java 04 - Exception Handling.md new file mode 100644 index 0000000..493ca9a --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Advanced Java 04 - Exception Handling.md @@ -0,0 +1,296 @@ +# Adv Java 04 - Exception Handling +---- +Exception handling is a critical aspect of programming in Java. It allows developers to manage and respond to unexpected errors that may occur during program execution. In this tutorial, we'll cover the basics of exception handling in Java for beginners. + +## Agenda +- Introduction to Exceptions +- Types of Exceptions + - Checked Exceptions + - Unchecked Exceptions +- Handling Exceptions + - The try-catch Block + - Multiple catch Blocks + - The finally Block + +- Throwing Exceptions +- Custom Exceptions +- Best Practices + - for Checked Exceptions + - Unchecked Exceptions +- Additional Reading + - More on Checked & Unchecked Exceptions + - Exception Hierarchy in Java + +## 1. Introduction to Exceptions +An exception is an event that disrupts the normal flow of a program. When an exceptional situation occurs, an object representing the exception is thrown. Exception handling allows you to catch and handle these exceptions, preventing your program from crashing. + +## 2. Types of Exceptions +In Java, exceptions are broadly categorized into two types: checked exceptions and unchecked exceptions. + +### Checked Exceptions +These are checked at compile-time, and the programmer is required to handle them explicitly using try-catch blocks or declare them in the method signature using the throws keyword. + +Checked exceptions extend the Exception class (directly or indirectly) but do not extend RuntimeException. They are subject to the compile-time checking by the Java compiler, meaning the compiler ensures that these exceptions are either caught or declared. + +Some common examples of checked exceptions include: + +- IOException +- SQLException +- ClassNotFoundException +- InterruptedException + +Handling checked exceptions involves taking appropriate actions to address the exceptional conditions that may arise during program execution. There are two primary ways to handle checked exceptions: using the try-catch block and the throws clause. + +The try-catch block is used to catch and handle exceptions. When a block of code is placed inside a try block, any exceptions that occur within that block are caught and processed by the corresponding catch block. + +Example +```java +import java.io.FileNotFoundException; +import java.io.FileReader; + +public class FileReaderMethodExample { + public static void main(String[] args) { + try { + readFile("example.txt"); + } catch (FileNotFoundException e) { + System.err.println("FileNotFoundException: " + e.getMessage()); + } + } + + // Method with a throws clause + static void readFile(String fileName) throws FileNotFoundException { + FileReader fileReader = new FileReader(fileName); + // Code to read from the file + } +} +``` + +### Unchecked Exceptions +These are not checked at compile-time, and they are subclasses of RuntimeException. They usually indicate programming errors, and it's not mandatory to handle them explicitly. +Unchecked exceptions also known as runtime exceptions, are exceptions that occur during the execution of a program. + +Unchecked exceptions can occur at runtime due to unexpected conditions, such as division by zero, accessing an array index out of bounds, or trying to cast an object to an incompatible type. + +Some common examples of unchecked exceptions include: + +`ArithmeticException`: Occurs when an arithmetic operation encounters an exceptional condition, such as division by zero. + +`NullPointerException`: Occurs when trying to access a member (field or method) on an object that is null. + +`ArrayIndexOutOfBoundsException`: Occurs when trying to access an array element with an index that is outside the bounds of the array. + +`ClassCastException`: Occurs when attempting to cast an object to a type that is not compatible with its actual type. + +```java +public class UncheckedExceptionExample { + public static void main(String[] args) { + try { + int result = divide(10, 0); // This may throw an ArithmeticException + System.out.println("Result: " + result); + } catch (ArithmeticException e) { + System.out.println("Error: " + e.getMessage()); + } + } + + static int divide(int a, int b) { + return a / b; + } +} +``` +In this example, the divide method may throw an ArithmeticException if the divisor b is zero. The try-catch block catches the exception and handles it, preventing the program from terminating abruptly. + + + +## 3. Handling Exceptions +### The try-catch Block +The try-catch block is used to handle exceptions. The code that might throw an exception is placed inside the try block, and the code to handle the exception is placed inside the catch block. + + +```java +try { + // Code that might throw an exception + // ... +} catch (ExceptionType e) { + // Code to handle the exception + // ... +} +``` +Example: +```java +public class ExceptionHandlingExample { + public static void main(String[] args) { + try { + int result = divide(10, 0); // This may throw an ArithmeticException + System.out.println("Result: " + result); + } catch (ArithmeticException e) { + System.out.println("Error: " + e.getMessage()); + } + } + + static int divide(int a, int b) { + return a / b; + } +} +``` + +### Multiple catch Blocks +You can have multiple catch blocks to handle different types of exceptions that may occur within the try block. + +```java +try { + // Code that might throw an exception + // ... +} catch (ExceptionType1 e1) { + // Code to handle ExceptionType1 + // ... +} catch (ExceptionType2 e2) { + // Code to handle ExceptionType2 + // ... +} +``` +Example: +```java +public class MultipleCatchExample { + public static void main(String[] args) { + try { + String str = null; + System.out.println(str.length()); // This may throw a NullPointerException + } catch (ArithmeticException e) { + System.out.println("ArithmeticException: " + e.getMessage()); + } catch (NullPointerException e) { + System.out.println("NullPointerException: " + e.getMessage()); + } catch (Exception e) { + System.out.println("Generic Exception: " + e.getMessage()); + } + } +} +``` + +### The finally Block +The finally block contains code that will be executed regardless of whether an exception is thrown or not. It is often used for cleanup operations, such as closing resources. + + +```java +try { + // Code that might throw an exception + // ... +} catch (ExceptionType e) { + // Code to handle the exception + // ... +} finally { + // Code that will be executed regardless of exceptions + // ... +} +``` +Example: +```java +public class FinallyBlockExample { + public static void main(String[] args) { + try { + System.out.println("Inside try block"); + int result = divide(10, 2); + System.out.println("Result: " + result); + } catch (ArithmeticException e) { + System.out.println("ArithmeticException: " + e.getMessage()); + } finally { + System.out.println("Inside finally block"); + } + } + + static int divide(int a, int b) { + return a / b; + } +} +``` + +### Throwing Exceptions +You can use the throw keyword to explicitly throw an exception in your code. This is useful when you want to signal an exceptional condition. +```java +public ReturnType methodName() throws ExceptionType1, ExceptionType2 { + // Method implementation + ... + if(condition){ + throw new ExceptionType1("Error message"); + } + ... +} +``` +The `throws` clause is used in a method signature to declare that the method may throw checked exceptions. It informs the caller that the method might encounter certain exceptional conditions, and the caller is responsible for handling these exceptions. + +Example +```java +public class ThrowExample { + public static void main(String[] args) { + try { + validateAge(15); // This may throw an InvalidAgeException + } catch (InvalidAgeException e) { + System.out.println("Error: " + e.getMessage()); + } + } + + static void validateAge(int age) throws InvalidAgeException { + if (age < 18) { + throw new InvalidAgeException("Age must be 18 or older"); + } + System.out.println("Valid age"); + } +} + +class InvalidAgeException extends Exception { + public InvalidAgeException(String message) { + super(message); + } +} +``` +## 5. Custom Exceptions +You can create your own custom exceptions by extending the Exception class or one of its subclasses. + +Example: +```java +public class CustomExceptionExample { + public static void main(String[] args) { + try { + throw new CustomException("Custom exception message"); + } catch (CustomException e) { + System.out.println("Caught custom exception: " + e.getMessage()); + } + } +} + +class CustomException extends Exception { + public CustomException(String message) { + super(message); + } +} +``` + +## 6. Best Practices +- Catch specific exceptions rather than using a generic catch (Exception e) block whenever possible. +- Handle exceptions at an appropriate level in your application. Don't catch exceptions if you can't handle them effectively. +- Clean up resources (e.g., closing files or database connections) in the finally block. +- Log exceptions or relevant information to aid in debugging. + + +### Best Practices for Checked Exceptions + +- Handle or Declare: Always handle checked exceptions using the try-catch block or declare them in the method signature using the throws clause. + +- Provide Meaningful Messages: When catching or throwing checked exceptions, include meaningful messages to aid in debugging. + +- Close Resources in a finally Block: If a method opens resources (e.g., files or database connections), close them in a finally block to ensure proper resource management. + +### Best Practices for Handling Unchecked Exceptions +- Use Defensive Programming: Validate inputs and conditions to avoid common causes of unchecked exceptions. + +- Catch Specific Exceptions: When using a try-catch block, catch specific exceptions rather than using a generic catch (RuntimeException e) block. This allows for more targeted handling. + +- Avoid Suppressing Exceptions: Avoid using empty catch blocks that suppress exceptions without any meaningful action. Log or handle exceptions appropriately. + +- Logging: Consider logging exceptions using logging frameworks (e.g., SLF4J) to record information that can aid in debugging. + +### Conclusion +Exception handling is a crucial aspect of Java programming, allowing developers to gracefully handle unexpected errors and improve the robustness of their applications. By understanding the basics of exception handling and following best practices, you can write more resilient and reliable Java code. As you gain experience, you'll become proficient in anticipating and addressing potential issues in your programs + +--- End --- + + diff --git a/Non-DSA Notes/LLD1 Notes/Concurrency 01 - Processes & Threads.md b/Non-DSA Notes/LLD1 Notes/Concurrency 01 - Processes & Threads.md new file mode 100644 index 0000000..8bab196 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Concurrency 01 - Processes & Threads.md @@ -0,0 +1,721 @@ +# Concurrency-1 Introduction to Processes and Threads +--- +In this tutorial, we will cover the following concepts. + +- How Computer Applications Run +- Concurrency: Real-World Applications of Threads + - Google Docs + - Music Player + - Adobe Lightroom + +- Processes and Threads + - Benefits of Multithreading + - Challenges of Multithreading +- Concurrent vs Parallel Execution +- Multithreading in Java + - Thread Creation + - Subclass of Thread Class + - Using Runnable + - Starting a Thread + - Problem Statement 1 : Number Printer +- Additional Concepts + - Commonly used Methods on Threads + - Problem Statement 2: Factorial Computation Task + - Thread Lifecycle & States + +## Understanding How Computer Applications Run +Computer applications are complex systems that run on a computer's operating system, interacting with hardware and software components to perform various tasks. To comprehend how these applications operate efficiently, it's essential to delve into fundamental concepts like processes, threads, CPU scheduling, multithreading, and parallel execution. Let us understand how a program runs. + +### 1. Programs / Processes +Programs: These are sets of instructions for the computer. Each application you use, such as a web browser or word processor, is a program. +Processes: When you open a program, it becomes a process. A process is an instance of a program in execution. + +### 2. Memory Allocation +When you start a program, the operating system (OS) allocates memory to it. This memory contains the program's code, data, and other necessary information. + +### 3. Processor (CPU) Execution +The Central Processing Unit (CPU) is the brain of the computer. It fetches instructions from memory and executes them. +Each process takes turns using the CPU. The OS manages this by employing a technique called CPU Scheduling. + +### 4. Context Switching +The CPU rapidly switches between different processes. This is known as context switching. +The OS saves the current state of a process, loads the state of the next process, and hands control to it. + +### 5. Multitasking +The ability of a computer to execute multiple processes concurrently is called multitasking. +While it may seem like everything is happening at once, the CPU is actually rapidly switching between processes. + +### 6. Parallel Execution +In some systems, especially those with multiple processors or cores, true parallel execution can occur. This means multiple processes genuinely run simultaneously. + +### 7. Threads +A process can be further divided into threads. Threads within a process share the same resources but can execute independently. Multithreading allows for parallel execution within a single process. + +### 8. Synchronization +When multiple processes or threads share resources (like data), synchronization mechanisms are employed to avoid conflicts. This ensures that data remains consistent.We will discuss synchronization in great detail in coming lectures. + +### 9. Task Management by the Operating System +The OS keeps track of all running processes and manages their execution. +It assigns priorities, allocates resources, and ensures fair access to the CPU. This is done by scheduling algorithms. Some of the popular scheduling algorithms are as follows - + +- First-Come-First-Serve (FCFS): Processes are executed in the order they arrive. +- Shortest Job Next (SJN): The process with the shortest execution time is selected. +- Round Robin (RR): Each process gets a fixed time slice, then moves to the back of the queue. +- Priority Scheduling: Processes are assigned priorities, and the highest priority process is executed first. + +Modern CPUs often employ a mix of static and dynamic scheduling strategies. +Advanced techniques, like predictive algorithms, may be used to anticipate the next process to run. + +### 10. Interrupts +The CPU can be interrupted to handle external events, like input from a user or data arriving from a network.Interrupts are crucial for maintaining responsiveness in a multitasking environment. +### 11. Termination of Processes +When a program finishes its task or is closed by the user, the associated process is terminated. The OS reclaims the allocated resources and frees up memory. + +### 12. Efficient Resource Utilization +The goal is to efficiently utilize the available resources, ensuring that each running application gets its fair share of CPU time. In summary, the execution of multiple applications or processes involves careful management by the operating system, with the CPU rapidly switching between tasks, allocating resources, and ensuring that everything runs smoothly. Multitasking and, in some cases, parallel execution contribute to the efficiency and responsiveness of modern computer systems. + + +### Conclusion +Understanding how computer applications run involves grasping the intricacies of processes, threads, CPU scheduling, multithreading, and parallel execution. As technology evolves, mastering these concepts becomes increasingly important for developing efficient and responsive applications. Experimenting with these concepts in programming languages and frameworks will deepen your understanding and proficiency in building robust and high-performance software. + +---- +## Concurrency: Real-World Applications of Threads +Concurrent programming, which involves the execution of multiple tasks simultaneously, is a fundamental concept in modern software development. One powerful mechanism for achieving concurrency is the use of threads. Let's explore how threads are employed in real-world applications, focusing on Google Docs, Music Players, and Adobe Lightroom. + +Concurrent programming enables multiple operations to progress in overlapping time intervals. Threads, the smallest units of execution within a process, are instrumental in achieving concurrency. They allow different parts of a program to run concurrently, enhancing efficiency and responsiveness. + +### 1. Google Docs: Collaborative Editing +Google Docs exemplifies the power of concurrency through its collaborative editing feature. When multiple users are editing a document simultaneously, threads come into play. Each user's edits are handled by a separate thread, ensuring that changes made by one user do not disrupt the editing experience of others. + +**Threads in Google Docs** +- Thread per User: Each user's editing actions are processed by an individual thread. +- Conflict Resolution: Threads synchronize to resolve conflicts and merge edits seamlessly. +- Auto-Suggest/Auto-complete: A separate thread can run spell check for the words you write. +- UI Thread: A separate thread can continuously update UI for the users. + +### 2. Music Players: Smooth Playback and User Interaction +In music players like Spotify or iTunes, threads are crucial for delivering a smooth user experience during playback while allowing users to interact with the application concurrently. + +**How Threads Work in Music Players** +- Playback Thread: A dedicated thread manages audio playback, ensuring uninterrupted streaming. +- User Interface Thread: Another thread handles user interactions, such as browsing playlists or adjusting settings. +- Parallel Execution: Threads allow simultaneous playback and user interactions without one affecting the other. + +### 3. Adobe Lightroom: Image Processing +In photo editing applications like Adobe Lightroom, where resource-intensive tasks like image processing are common, threads are employed to maintain responsiveness and reduce processing times. + +**How Threads Work in Lightroom** +- Image Processing Threads: Multiple threads handle the processing of different parts of an image concurrently. +- Background Tasks: Threads enable background tasks like importing photos while allowing users to continue editing. +- Responsive UI: Threads ensure that the user interface remains responsive even during computationally intensive operations. + +--- + +### Processes and Threads - Deep Dive +A **process** is an independent program in execution. It has its own memory space called heap, code, data, and system resources. The heap isn't shared between two applications or two processes, they each have their own. The terms process and application are often used interchangeably. Processes enable multiple tasks to run concurrently, offering isolation and independence. + + +**Process Lifecycle** +- Creation: When a program is launched, it is loaded into memory, a process is created. +- Execution: The process runs its instructions. +- Termination: The process completes its execution or is terminated. + + +A **thread** is the smallest unit of execution within a process. Multiple threads can exist within a single process, sharing the same resources like memory but executing independently. + +![](https://www.javamex.com/tutorials/threads/ThreadDiagram.png) + + +**Benefits of Multithreading - Why use multiple threads?** +- Performance: Threads can execute concurrently, enhancing performance. +- Responsiveness: Multithreading allows a program to remain responsive during time-consuming tasks. This is especially helpful in applications with user interfaces. +- Efficiency: Exploiting parallelism improves overall system performance. +- One of the most common reasons, is to offload long running tasks. +Instead of tying up the main thread, we can create additional threads, to execute tasks that might take a long time. This frees up the main thread so that it can continue working, and executing, and being responsive to the user. +- You also might use multiple threads to process large amounts of data, which can improve performance, of data intensive operations. +- A web server, is another use case for many threads, allowing multiple connections and requests to be handled, simultaneously. +- Resource Sharing: Threads within a process share resources, reducing overhead. + + +**Challenges** +- Data Synchronization: Threads may need to synchronize access to shared data to prevent conflicts. +- Deadlocks: Concurrent threads might lead to situations where each is waiting for the other to release a resource. + +We will address these challenges in the up-coming classes. + +--- +## Concurrent Execution vs Parallel Execution + - Concurrent execution refers to the ability of a system to execute multiple tasks or processes at the same time, appearing to overlap in time. Concurrent Execution can happen on single core as well. + +- Parallel execution involves the simultaneous execution of multiple tasks or processes using multiple processors or cores. Multiple cores are must for truly parallel execution. + +![](https://miro.medium.com/v2/resize:fit:409/1*_4B2PKsJn9pUz3jbTnBnYw.png) +Data Parallelism: Dividing a task into subtasks processed concurrently. +Task Parallelism: Assigning multiple independent tasks to separate processors/cores. + +#### Key Differences +**Concurrent Execution** - Tasks may overlap in time but not necessarily execute simultaneously. +**Parallel Execution** - Tasks are actively running at the same time on separate processors. + + +#### Resource Utilization +**Concurrent Execution** - Utilizes a single processor by interleaving tasks. +**Parallel Execution** - Utilizes multiple processors, ensuring more tasks are completed in the same time frame. + +#### Hardware Requirement +**Concurrent Execution** - Can occur on a system with a single processor. +**Parallel Execution** - Requires multiple processors or cores. + +#### Example +**Concurrent Execution** - Multiple applications running on a single-core processor. +**Parallel Execution** - Image processing tasks being performed simultaneously on different cores of a multi-core processor. + +--- + +## Multithreading in Java + +In the Java, multithreading is driven by the core concept of a Thread. There are two ways to create Threads in Java. + +**Thread Class**: Java provides the Thread class, which serves as the foundation for creating and managing threads. + +**Runnable Interface**: The Runnable interface is often implemented to define the code that a thread will execute. + +Lets write some logic that runs in a parallel thread by using the Thread framework. In the below code example we are creating two threads and running them concurrently. + +**Way-1 : Subclassing a Thread Class** +```java +public class NewThread extends Thread { + public void run() { + // business logic + ... + } + } +} +``` +Class to initialize and start our thread. + +```java +public class MultipleThreadsExample { + public static void main(String[] args) { + NewThread t1 = new NewThread(); + t1.setName("MyThread-1"); + NewThread t2 = new NewThread(); + t2.setName("MyThread-2"); + t1.start(); + t2.start(); + } +} +``` + +**Way -2 : Using Runnable (Preferred Way)** +```java +class SimpleRunnable implements Runnable { + public void run() { + // business logic + } +} +``` + +```java +public class Main { + public static void main(String[] args) { + Thread t = new Thread(new SimpleRunnable()); + t.start(); + } +} + +``` + +The above SimpleRunnable is just a task which we want to run in a separate thread. +There’re various approaches we can use for running it; one of them is to use the Thread class. + +Simply put, we generally encourage the use of Runnable over Thread: +When extending the Thread class, we’re not overriding any of its methods. Instead, we override the method of Runnable (which Thread happens to implement). +- This is a clear violation of IS-A Thread principle +- Creating an implementation of Runnable and passing it to the Thread class utilizes composition and not inheritance – which is more flexible +- After extending the Thread class, we can’t extend any other class +- From Java 8 onwards, Runnables can be represented as lambda expressions + +```java +public class ThreadWithLambdaExample { + public static void main(String[] args) { + // Creating a thread with a Runnable implemented as a lambda expression + Thread myThread = new Thread(() -> { + System.out.println(Thread.currentThread().getName()); + ... + } + }); + + // Starting the thread + myThread.start(); + } +} +``` +We create a new Thread and pass a Runnable as a lambda expression directly to its constructor. The lambda expression defines the code to be executed in the new thread. In this case, it's a simple prints message that prints the name of Thread. + + +## Starting a Thread - Behind the Scenes +When you call `thread.start()` in Java, it initiates the execution of the thread and invokes the run method of the thread. Here's a step-by-step explanation of what happens: + +### 1. Thread Initialization + +If you have a class that extends the Thread class, or if you have a class that implements the Runnable interface, you create an instance of that class, which represents the thread. +```java +Thread myThread = new MyThread(); // or Thread myThread = new Thread(new MyRunnable()); +``` + +The thread is in the "new" state after initialization. When you call start(), the thread transitions to the "runnable" state. It is ready to run but is waiting for its turn to be scheduled by the Java Virtual Machine (JVM). +``` +myThread.start(); +``` +### 2. Thread Scheduling: + +The JVM's scheduler determines when the thread gets CPU time for execution. The actual timing is managed by the operating system, and it may vary. + +### 3. run() Method Execution: + +Once the thread is scheduled, the JVM calls the run method of the thread. The run method contains the code that will be executed in the new thread. +```java +class MyThread extends Thread { + public void run() { + // Code to be executed by the thread + } +} +``` +If you implemented Runnable instead: + +```java +class MyRunnable implements Runnable { + public void run() { + // Code to be executed by the thread + } +} +``` +### 4. Concurrent Execution: + +If there are multiple threads in the program, they may execute concurrently, with each thread running independently, potentially interleaving their execution. +### 5. Thread Termination: +The run method completes its execution, and the thread transitions to the "terminated" state. The thread is no longer active. + + +### Important Notes +- Direct run Method Invocation: Calling the run method directly (myThread.run()) will not start a new thread; it will execute the run method in the current thread (ie main thread). + +- One-Time Execution: The start method can only be called once for a thread. Subsequent calls will result in an IllegalThreadStateException. + +- In summary, calling thread.start() initiates the execution of a new thread, and the JVM takes care of the thread scheduling and execution of the run method in a separate concurrent context. +---- +## Problem Statement - 1 Number Printer +Write a program to print numbers from 1 to 100 using 100 different threads. Since you can't control the order of execution of threads, it is okay to get these numbers in any order. +Hint: Create a Runnable Task, which prints a single number. +### Solution + +**NumberPrinter.java** +```java +public class NumberPrinter implements Runnable { + int number; + NumberPrinter(int number){ + this.number = number; + } + @Override + public void run(){ + System.out.println("Printing "+number + " from "+Thread.currentThread().getName()); + } +} +``` + +**Main.java** +```java +public class Main { + public static void main(String[] args) { + for(int i=0; i<100;i++){ + Thread t = new Thread(new NumberPrinter(i)); + t.start(); + } + } +} + +``` +Sample Output +``` +Printing 3 from Thread-3 +Printing 19 from Thread-19 +Printing 14 from Thread-14 +Printing 6 from Thread-6 +Printing 21 from Thread-21 +Printing 22 from Thread-22 +Printing 0 from Thread-0 +Printing 10 from Thread-10 +... +Printing 94 from Thread-94 +Printing 95 from Thread-95 +Printing 96 from Thread-96 +Printing 97 from Thread-97 +Printing 98 from Thread-98 +Printing 99 from Thread-99 +``` +--- +## Additonal Concepts (Optional) +Lets cover some more advanced concepts related to Threads. + + +### Commonly used Methods on Threads + +In Java, the Thread class provides several commonly used methods for managing and controlling threads. Here are some of the key methods: + +#### 1. start() +Initiates the execution of the thread, causing the run method to be called. +Usage ```myThread.start();``` + +#### 2. run() +Contains the code that will be executed by the thread. This method needs to be overridden when extending the Thread class or implementing the Runnable interface. +Usage: Defined by the user based on the specific task. + +#### 3. sleep(long milliseconds) +Description: Causes the thread to sleep for the specified number of milliseconds, pausing its execution. +Usage:```Thread.sleep(1000);``` + +#### 4. join() +Waits for the thread to complete its execution before the current thread continues. It is often used for synchronization between threads. +Usage: ```myThread.join();``` + +#### 5. interrupt() +Interrupts the thread, causing it to stop or throw an InterruptedException. The thread must handle interruptions appropriately. +Usage: +```myThread.interrupt();``` +#### 6. isAlive(): +Returns true if the thread has been started and has not yet completed its execution, otherwise returns false. +Usage: `boolean alive = myThread.isAlive();` + +#### 7. setName(String name) +Sets the name of the thread. +Usage: `myThread.setName("MyThread");` + +#### 8. getName() +Returns the name of the thread. +Usage: `String threadName = myThread.getName();` + +#### 9. setPriority(int priority) +Sets the priority of the thread. Priorities range from Thread.MIN_PRIORITY to Thread.MAX_PRIORITY. +Usage: ```myThread.setPriority(Thread.MAX_PRIORITY);``` + +#### 10. getPriority() +Returns the priority of the thread. +Usage: ```int priority = myThread.getPriority();``` + +#### 11. currentThread() +Returns a reference to the currently executing thread object. +Usage: `Thread currentThread = Thread.currentThread();` + +These methods provide essential functionality for managing thread execution, synchronization, and interaction. When working with threads, it's crucial to understand and use these methods effectively to create robust and efficient concurrent programs. + +--- +## Problem Statement - 2 Factorial Computation Task +Write a program that computes Factorial of a list of numbers. Each factorial should be computed on a separate thread. For each factorial calculation, do not wait for more than 2 seconds. +Hint: Use the join() method on each factorial thread, before main starts executing again. + +### Solution +**FactorialThread.java** +```java +import java.math.BigInteger; + +public class FactorialThread extends Thread { + private long number; + private BigInteger result; + private boolean isFinished; + + FactorialThread(long number){ + this.number = number; + result = BigInteger.valueOf(0); //Or BigInteger.ZERO; + isFinished = false; + } + + @Override + public void run() { + //Business Logic + result = factorial(number); + isFinished = true; + } + BigInteger factorial(long n){ + BigInteger ans = BigInteger.ONE; + for(long i=2; i<=n; i++){ + ans = ans.multiply(BigInteger.valueOf(i)); + } + return ans; + } + + BigInteger getResult(){ + return result; + } + boolean isFinished(){ + return isFinished; + } +} +``` + +```java +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +public class Main { + + // Task calculate Factorial of List of Numbers + public static void main(String[] args) throws InterruptedException { + + List inputNumbers = Arrays.asList(100000000L, 3435L, 35435L, 2324L, 4656L, 23L, 5556L); + List threads = new ArrayList<>(); + for(long number:inputNumbers){ + FactorialThread t = new FactorialThread(number); + //System.out.println(t.getState()); + threads.add(t); + } + + for(Thread t:threads){ + t.start(); + } + + for(Thread t:threads){ + t.join(2000); + } + + //--------------------// + for(int i=0;i { + V call() throws Exception; +} +``` + +The `call` method returns a result of type `V`. The `call` method can throw an exception. The `Callable` interface is used to execute tasks that return a result. +For instance we can use the `Callable` interface to execute a task that returns the sum of two numbers: + +```java +Callable sumTask = () -> 2 + 3; +``` + +In order to execute a task that returns a result, we can use the `submit` method of the `ExecutorService` interface. The `submit` method takes a `Callable` object as a parameter. The `submit` method returns a `Future` object. The `Future` interface has a method called `get` that returns the result of the task. The `get` method is a blocking method. It waits until the task is completed and then returns the result of the task. + +```java +ExecutorService executorService = Executors.newCachedThreadPool(); +Future future = executorService.submit(() -> 2 + 3); +Integer result = future.get(); +``` + +Futures can be used to cancel tasks. The `Future` interface has a method called `cancel` that can be used to cancel a task. The `cancel` method takes a boolean parameter. If the boolean parameter is `true`, the task is cancelled even if the task is already running. If the boolean parameter is `false`, the task is cancelled only if the task is not running. + +```java +ExecutorService executorService = Executors.newCachedThreadPool(); +Future future = executorService.submit(() -> 2 + 3); +future.cancel(false); +``` + +--- +## Coding Problem 1 : Merge Sort +Implement multi-threaded merge sort. + +**Solution** +**Sorter.java** +```java +import java.util.ArrayList; +import java.util.concurrent.Callable; +import java.util.List; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; + +public class Sorter implements Callable> { + private List arr; + private ExecutorService executor; + Sorter(List arr,ExecutorService executor){ + this.arr = arr; + this.executor = executor; + + } + @Override + public List call() throws Exception { + //Business Logic + //base case + if(arr.size()<=1){ + return arr; + } + + //recursive case + int n = arr.size(); + int mid = n/2; + + List leftArr = new ArrayList<>(); + List rightArr = new ArrayList<>(); + + //Division of array into 2 parts + for(int i=0;i> leftFuture = executor.submit(leftSorter); + Future> rightFuture = executor.submit(rightSorter); + + leftArr = leftFuture.get(); + rightArr = rightFuture.get(); + + + //Merge + List output = new ArrayList<>(); + int i=0; + int j=0; + while(i l = List.of(7,3,1,2,4,6,17,12); + ExecutorService executorService = Executors.newCachedThreadPool(); + + Sorter sorter = new Sorter(l,executorService); + Future> output = executorService.submit(sorter); + System.out.println(output.get()); //Blocking Code + executorService.shutdown(); + } +} +``` +## Coding Problem 2 : Download Manager (Homework) +Consider a simple download manager application that needs to download multiple files concurrently. Implement the download manager using the Java Executor Framework. +Requirements: +- The download manager should be able to download multiple files simultaneously. +- Each file download is an independent task that can be executed concurrently. +- The download manager should use a thread pool from the Executor Framework to manage and execute the download tasks. +- Implement a mechanism to track the progress of each download task and display it to the user. + +```java +import java.util.List; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; + +class DownloadManager { + private ExecutorService executorService; + + public DownloadManager(int threadPoolSize) { + // TODO: Initialize the ExecutorService with a fixed-size thread pool. + } + + public void downloadFiles(List fileUrls) { + // TODO: Implement a method to submit download tasks for each file URL. + } + + // TODO: Implement a method to track and display the progress of each download task. + + public void shutdown() { + // TODO: Shutdown the ExecutorService when the download manager is done. + } +} +``` +```java +public class DownloadManagerApp { + public static void main(String[] args) { + // TODO: Create a DownloadManager instance with an appropriate thread pool size. + // TODO: Test the download manager by downloading multiple files concurrently. + // TODO: Display the progress of each download task. + // TODO: Shutdown the download manager after completing the downloads. + } +} +``` + +**Tasks for Implementation** +- Initialize the ExecutorService in the DownloadManager constructor. +- Implement the downloadFiles method to submit download tasks for each file URL using the ExecutorService. +- Implement a mechanism to track and display the progress of each download task. +- Test the download manager in the DownloadManagerApp by downloading multiple files concurrently. +- Shutdown the ExecutorService when the download manager is done. +- Feel free to adapt and extend the code as needed. This example focuses on using the Executor Framework for concurrent file downloads in a download manager application. + +**Solution** +```java +import java.util.List; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.Future; + +class DownloadTask implements Runnable { + private String fileUrl; + + public DownloadTask(String fileUrl) { + this.fileUrl = fileUrl; + } + + @Override + public void run() { + // Simulate file download + System.out.println("Downloading file from: " + fileUrl); + + // Simulate download progress + for (int progress = 0; progress <= 100; progress += 10) { + System.out.println("Progress for " + fileUrl + ": " + progress + "%"); + try { + Thread.sleep(500); // Simulate download time + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } + + System.out.println("Download complete for: " + fileUrl); + } +} + +class DownloadManager { + private ExecutorService executorService; + + public DownloadManager(int threadPoolSize) { + executorService = Executors.newFixedThreadPool(threadPoolSize); + } + + public void downloadFiles(List fileUrls) { + for (String fileUrl : fileUrls) { + DownloadTask downloadTask = new DownloadTask(fileUrl); + executorService.submit(downloadTask); + } + } + + public void shutdown() { + executorService.shutdown(); + } +} + +public class DownloadManagerApp { + public static void main(String[] args) { + DownloadManager downloadManager = new DownloadManager(3); // Use a thread pool size of 3 + + List filesToDownload = List.of("file1", "file2", "file3", "file4", "file5"); + + downloadManager.downloadFiles(filesToDownload); + + // Display progress (simulated) + // Note: In a real-world scenario, you might need to implement a more sophisticated progress tracking mechanism. + for (int i = 0; i < 10; i++) { + System.out.println("Main thread is doing some work..."); + try { + Thread.sleep(1000); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } + + downloadManager.shutdown(); + } +} +``` + +## Coding Problem 3 : Image Processing +Many image processing applications like Lightroom & Photoshop use multiple threads to process an image quickly. In this problem, you will build a simplified image repainting task using multiple threads, the repainting task here simply doubles the value of every pixel stored in the form of a 2D array. Take Input a NXN matrix and repaint it by using 4 threads, one for each quadrant. + +**Solution** +Repainting a 2D array using four threads can be achieved by dividing the array into quadrants, and assigning each quadrant to a separate thread for repainting. + +This example divides the 2D array into four quadrants and assigns each quadrant to a separate thread for repainting. The ArrayRepainterTask class represents the task for repainting a specific quadrant. The program then uses an ExecutorService with a fixed thread pool to concurrently execute the tasks. Finally, it prints the repainted 2D array. + +Below is an example code using the Java Executor Framework + +```java +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; + +class ArrayRepainterTask implements Runnable { + private final int[][] array; + private final int startRow; + private final int endRow; + private final int startCol; + private final int endCol; + + public ArrayRepainterTask(int[][] array, int startRow, int endRow, int startCol, int endCol) { + this.array = array; + this.startRow = startRow; + this.endRow = endRow; + this.startCol = startCol; + this.endCol = endCol; + } + + @Override + public void run() { + // Simulate repainting for the specified quadrant + for (int i = startRow; i <= endRow; i++) { + for (int j = startCol; j <= endCol; j++) { + array[i][j] = array[i][j] * 2; // Repaint by doubling the values (simulated) + } + } + } +} + +public class ArrayRepaintingExample { + public static void main(String[] args) { + int[][] originalArray = { + {1, 2, 3, 4}, + {5, 6, 7, 8}, + {9, 10, 11, 12}, + {13, 14, 15, 16} + }; + + int rows = originalArray.length; + int cols = originalArray[0].length; + + ExecutorService executorService = Executors.newFixedThreadPool(4); + + // Divide the array into four quadrants + int midRow = rows / 2; + int midCol = cols / 2; + + // Create tasks for each quadrant + ArrayRepainterTask task1 = new ArrayRepainterTask(originalArray, 0, midRow - 1, 0, midCol - 1); + ArrayRepainterTask task2 = new ArrayRepainterTask(originalArray, 0, midRow - 1, midCol, cols - 1); + ArrayRepainterTask task3 = new ArrayRepainterTask(originalArray, midRow, rows - 1, 0, midCol - 1); + ArrayRepainterTask task4 = new ArrayRepainterTask(originalArray, midRow, rows - 1, midCol, cols - 1); + + // Submit tasks to the ExecutorService + executorService.submit(task1); + executorService.submit(task2); + executorService.submit(task3); + executorService.submit(task4); + + // Shutdown the ExecutorService + executorService.shutdown(); + + // Wait for all tasks to complete + while (!executorService.isTerminated()) { + // Wait + } + + // Print the repainted array + for (int[] row : originalArray) { + for (int value : row) { + System.out.print(value + " "); + } + System.out.println(); + } + } +} +``` +## Coding Problem 4: Scheduled Executor +Write a Java program that uses ScheduledExecutorService to schedule a task to run periodically. Implement a task that prints a message “Hello” at fixed intervals of 5s. +```java +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; + +public class ScheduledExecutorExample { + + public static void main(String[] args) { + // Create a ScheduledExecutorService with a single thread + ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(); + + // Schedule the task to run periodically every 5 seconds + scheduledExecutorService.scheduleAtFixedRate(() -> { + System.out.println("Hello"); + }, 0, 5, TimeUnit.SECONDS); + + // Sleep for a while to allow the task to run multiple times + try { + Thread.sleep(20000); + } catch (InterruptedException e) { + e.printStackTrace(); + } + + // Shutdown the ScheduledExecutorService + scheduledExecutorService.shutdown(); + } +} + +``` + + + +--- + +## Synchronisation + +Whenever we have multiple threads that access the same resource, we need to make sure that the threads do not interfere with each other. This is called synchronisation. + +Synchronisation can be seen in the adder and subtractor example. The adder and subtractor threads access the same counter variable. If the adder and subtractor threads do not synchronise, the counter variable can be in an inconsistent state. + +* Create a count class that has a count variable. +* Create two different classes `Adder` and `Subtractor`. +* Accept a count object in the constructor of both the classes. +* In `Adder`, iterate from 1 to 10000 and increment the count variable by 1 on each iteration. +* In `Subtractor`, iterate from 1 to 10000 and decrement the count variable by 1 on each iteration. +* Print the final value of the count variable. +* What would the ideal value of the count variable be? +* What is the actual value of the count variable? +* Try to add some delay in the `Adder` and `Subtractor` classes using inspiration from the code below. What is the value of the count variable now? + + +**Steps to implement** +- Implement Adder & Subtractor +- Shared counter via constructor +- Create a package called addersubtractor +- Create two tasks under adder and subtractor + +Adder +```java +package addersubtractor; + public class Adder implements Runnable { + private Count count; + public Adder (Count count) { + this.count = count; + } + @Override + public void run() { + for (int i = 1 ; i <= 100; ++ 1) { + count.value += i; + } +} +``` +Subtracter +```java +package addersubtractor; + public class Subtractor implements Runnable { + private Count count; + public Subtractor (Count count) { + this.count = count + } + @Override + public void run() { + for (int i = 1 ; i <= 100; ++ 1) { + count.value -= i; + } +} +``` +Count class +```java +package addersubtractor; + public class Count { + int value = 0; + } +``` +Now, let’s make our client class here: + +```java +public static void main(String[] args) { + Count count = new Count; + Adder adder = new Adder (count); + Subtractor subtractor = new Subtractor (count); + Thread t1 = new Thread (adder); + Thread t2 = new Thread (subtractor); + t1.start(); + t2.start(); + t1.join(); + t2.join(); + + system.out.println(count.value); +``` +Output is some random number every time we run the code. + +Now, this particular problem is known to be a data synchronization problem. +This happens because the same data object is shared among various multi-threads, and they both are trying to modify the same data. This is an unexpected result that we have seen, but we will continue this in the next tutorial. + diff --git a/Non-DSA Notes/LLD1 Notes/Concurrency 03 - Synchronisation, Mutex, Atomic Data Types etc..md b/Non-DSA Notes/LLD1 Notes/Concurrency 03 - Synchronisation, Mutex, Atomic Data Types etc..md new file mode 100644 index 0000000..88e95d2 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Concurrency 03 - Synchronisation, Mutex, Atomic Data Types etc..md @@ -0,0 +1,883 @@ +# Concurrency-3 Introduction to Synchronisation, Mutex, Synchronized, Atomic Data-types +---- + +### Agenda +- Synchronisation Problem + - Adder Subtracter Recap + - Conditions for Synchronisation Problem + - Properties for a Good Solution + - Solutions for Synchronisation + - Mutex Locks + - Synchronised Keyword + - Semaphores(Next Class) + - Coding Problems + - Thread Safe Counter + - ReentrantLock Basics + +- Addtional Topics + - Atomic Datatypes + - Volatile Keyword + - Concurrent Hashmap (Interviews) + +- Coding Projects (Homework) + - Ticket Booking System (Project) + - Thread Safe Bank Transactions +- Additional Reading + +# Synchronisation Problem + +## Adder Subtracter Recap + +The adder and subtractor problem is a sample problem that is used to demonstrate the need for synchronisation in a system. The problem is as follows: + +- Create a count class that has a count variable. +- Create two different classes Adder and Subtractor. +- Accept a count object in the constructor of both the classes. +- In Adder, iterate from 1 to 100 and increment the count variable by 1 on each iteration. +- In Subtractor, iterate from 1 to 100 and decrement the count variable by 1 on each iteration. +- Print the final value of the count variable. + +**What would the ideal value of the count variable be?** +**What is the actual value of the count variable?** +Try to add some delay in the Adder and Subtractor classes using inspiration from the code below. What is the value of the count variable now? + +**Adder.java** +```java +public class Adder implements Runnable { + private Count count; + + public Adder(Count count) { + this.count = count; + } + + @Override + public void run() { + for (int i = 0; i < 100; i++) { + count.increment(); + } + } +} +``` + +**Subtracter.java** +```java +public class Subtractor implements Runnable { + private Count count; + + public Subtractor(Count count) { + this.count = count; + } + + @Override + public void run() { + for (int i = 0; i < 100; i++) { + count.decrement(); + } + } +} +``` +**Runner.java** +```java +public class Runner { + public static void main(String[] args) { + Count count = new Count(); + Adder adder = new Adder(count); + Subtractor subtractor = new Subtractor(count); + + Thread adderThread = new Thread(adder); + Thread subtractorThread = new Thread(subtractor); + + adderThread.start(); + subtractorThread.start(); + + adderThread.join(); + subtractorThread.join(); + + System.out.println(count.getCount()); + } +} +``` + +## Synchronisation Problem +In multithreaded environments, synchronization problems can arise due to concurrent execution of multiple threads, leading to unpredictable and undesirable behavior. There are certain conditions that can lead to synchronization problems. These Conditions are: + +**1. Critical Section** +- A critical section is a part of the code that must be executed by only one thread at a time to avoid data inconsistency or corruption. +- If multiple threads access and modify shared data simultaneously within a critical section, it can lead to unpredictable results. +- Ensuring mutual exclusion by using synchronization mechanisms like locks or semaphores helps prevent multiple threads from entering the critical section simultaneously. +Race Condition: + +**2. Race Conditions** +- A race condition occurs when the final outcome of a program depends on the relative timing of events, such as the order in which threads are scheduled to run. +- In a race condition, the correctness of the program depends on the timing of the thread executions, and different outcomes may occur depending on the interleaving of thread execution. +- Proper synchronization mechanisms, like locks or atomic operations, are needed to prevent race conditions by enforcing a specific order of execution for critical sections. +Preemption: + +**3. Preemption** +- Preemption refers to the interrupting of a currently executing thread to start or resume the execution of another thread. +- In multithreaded environments, preemption can lead to issues if not handled carefully. For example, a thread might be preempted while in the middle of updating shared data, leading to inconsistent or corrupted state. +- To avoid issues related to preemption, critical sections should be protected using mechanisms like locks or disabling interrupts temporarily to ensure that a thread completes its operation without being interrupted. + + +To address these synchronization problems, various synchronization mechanisms are employed, such as locks, semaphores, and atomic operations. These tools help ensure that only one thread can access critical sections at a time, preventing race conditions and mitigating the impact of preemption on shared data. Additionally, proper design practices, like minimizing the use of shared mutable data and using thread-safe data structures, can contribute to reducing synchronization issues in multithreaded environments. + + +### Properties of a Good Synchronization Solution + +1. **Mutual Exclusion:** + - *Definition:* Only one thread should be allowed to execute its critical section at any given time. Suppose there are three threads, and they are waiting to enter the critical sections of the Adder, Subtractor, and Multiplier. But a blocker should be there to allow only one thread in a critical section at a time. + + - *Importance:* Ensures that conflicting operations on shared resources do not occur simultaneously, preventing data corruption or inconsistency. + +2. **Progress:** + - *Definition:* The overall system should keep moving and making progress. It should not stop at any stage and be waiting for a long period. If no thread is in its critical section and some threads are waiting to enter the critical section, then the selection of the next thread to enter the critical section should be definite. + + + - *Importance:* Guarantees that the system makes progress and avoids deadlock situations where threads are unable to proceed. + +3. **Bounded Waiting:** + - *Definition:* There exists a limit on the number of times other threads are allowed to enter their critical sections after a thread has requested entry into its critical section and before that request is granted. No thread should be waiting infinitely. There should be a bound on how long they have to wait before they are allowed to enter the critical section. + + - *Importance:* Prevents the problem of starvation, where a thread is repeatedly delayed in entering its critical section by other threads. + +4. **No Deadlock:** + - *Definition:* A deadlock is a state where two or more threads are blocked forever, each waiting for the other to release a resource. + - *Importance:* A good synchronization solution should avoid deadlocks, as they can lead to a complete system halt and result in unresponsive behavior. + +5. **Efficiency:** + - *Definition:* The synchronization solution should introduce minimal overhead and allow non-conflicting threads to execute concurrently. + - *Importance:* Ensures that the system performs well and doesn't suffer from unnecessary delays or resource contention. + +6. **Adaptability:** + - *Definition:* The synchronization solution should be adaptable to different system configurations and workloads. + - *Importance:* Facilitates the use of the synchronization mechanism in a variety of scenarios without requiring significant modifications. + +7. **Low Busy-Waiting:** + - *Definition:* Minimizes the use of busy-waiting (spinning in a loop while waiting for a condition to be satisfied) to conserve CPU resources. When a thread has to continuously check if they can now enter the critical section. Checking if a thread can enter the critical section is not a productive use of time. +The ideal solution should have some kind of notification system. + For example if you have to check if a person is available or not: +In way 1, you go and knock on the person’s door every 2 minutes to check if they are free. This is busy waiting +In way 2, you go and tell the person that I am here. Please let me know when you are free. This is called a notification. This provides better usage of the time. + + - *Importance:* Reduces unnecessary CPU consumption, making the system more efficient and avoiding the negative impact of busy-waiting on power consumption. + + + +8. **Fairness:** + - *Definition:* All threads should have a fair chance to enter their critical sections. No thread should be unfairly delayed or granted preferential access. + - *Importance:* Ensures that the synchronization solution treats all threads fairly, preventing situations where some threads consistently get better access to shared resources. + +9. **Scalability:** + - *Definition:* The synchronization solution should scale well with an increasing number of threads and resources. + - *Importance:* Allows the system to efficiently handle a growing number of threads without a significant degradation in performance. + +10. **Portability:** + - *Definition:* The synchronization solution should be portable across different platforms and operating systems. + - *Importance:* Enables the synchronization mechanism to be used in diverse computing environments without requiring extensive modifications. + +----- +## Solutions to Synchronisation Problem + +### 1. Mutex Lock +Mutex means Mutual Exclusion. Mutex Lock is a lock that enables mutual exclusion. Mutex locks are a way to solve the synchronisation problem. Mutex locks are a way to ensure that only one thread can access a critical section at a time. Mutex locks are also known as mutual exclusion locks. + +A thread can only access the critical section if it has the lock. If a thread does not have the lock, it cannot access the critical section. If a thread has the lock, it can access the critical section. If a thread has the lock, it can release the lock and allow another thread to access the critical section. + +Suppose if we take the example of Adder and Subtractor here, +Adder: +```java +print('Hi') +x <- read(count) +x = x + 1 +print('Bye') +``` + +Subtractor: +```java +print('Hello') +x <- read(count) +x = x - 1 +print('Bye') +``` + +The above adder-substracter if executed concurrently, can lead to wrong results due to interleaving of instructions. +So, MUTEX SAYS: +- A thread must take a lock before it enters its critical section. +- They must remove the lock as soon as they leave the critical section. + +Think of a room with a lock. Only one person can enter the room at a time. If a person has the key, they can enter the room. If a person does not have the key, they cannot enter the room. If a person has the key, they can leave the room and give the key to another person. This is the same as a mutex lock. + +- So, A thread(person) must take a lock(have a key) before they enter their critical section(Room). +- They must remove the lock(key) as soon as they leave the (Room)critical section. +- By default, the program can not enforce synchronization. The developer has to do it. + +**So what do we have to do?** +Before entering a critical section, lock the thread and, at exit, unlock it. +Example: + +Adder: +```java +print('Hi') +lock.lock() +x <- read(count) +x = x + 1 +count = x +lock.unlock() +print('Bye') +``` +Subtractor: +```java +print('Hello') +lock.lock() +x <- read(count) +x = x - 1 +count = x +lock.unlock() +print('Bye') +``` +Now let’s discuss about the Properties of lock: +- Only one thread can unlock a thread at one time; other threads have to wait till that thread unlocks. +- Lock will automatically notify the second thread to run when the first one exits. +- It has no busy waiting +- It has mutual exclusion +- Bounded waiting +- The system is having overall progress + +Code for reference: +```java +Client class (main) + +public static void main(String[] args) { + Count count = new Count; + Lock lock = new ReentranttLock(); + Adder adder = new Adder (count, lock); + Subtractor subtractor = new Subtractor (count, lock); + Thread t1 = new Thread (adder); + Thread t2 = new Thread (subtractor); + t1.start(); + t2.start(); + t1.join(); + t2.join(); + + system.out.println(count.value); +``` +Now, let’s change the constructor of adder and subtractor +Adder: +```java +package addersubtractor; + public class Adder implements Runnable { + private Count count; + private Lock lock; + public Adder (Count count, Lock lock) { + this.count = count; + this.lock = lock + } + @Override + public void run() { + for (int i = 1 ; i <= 100; ++ 1) { + lock.lock() + count.value += i; + lock.unlock(); + } +} +``` +Subtractor: +```java +package addersubtractor; + public class Subtractor implements Runnable { + private Count count; + private Lock lock; + public Subtractor (Count count, Lock lock) { + this.count = count; + this.lock = lock; + } + @Override + public void run() { + for (int i = 1 ; i <= 100; ++ 1) { + lock.lock(); + count.value -= i; + lock.unlock() + } +} +``` +Now, when we run this, we will always get 0 as the answer, which was not the case earlier. + +#### Properties of a mutex lock +- A thread can only access the critical section if it has the lock. +- Only one thread can have the lock at a time. +- Other threads cannot access the critical section if a thread has the lock and thus have to wait. +- Lock will automatically be released when the thread exits the critical section. + +### 2. Synchronised keyword +The synchronized keyword is a way to solve the synchronisation problem. The synchronized keyword is a way to ensure that only one thread can access a critical section at a time. + +A synchronized method or block can only be accessed by one thread at a time. If a thread is accessing a synchronized method or block, other threads cannot access the synchronized method or block. If a thread is accessing a synchronized method or block, other threads have to wait until the thread exits the synchronized method or block. + +Following is an example of a synchronized method: +```java +public class Count { + private int count = 0; + + public synchronized void increment() { + count++; + } + + public synchronized void decrement() { + count--; + } + + public int getCount() { + return count; + } +} +``` +In the above example, the increment() and decrement() methods are synchronized. This means that only one thread can access the increment() and decrement() methods at a time. If a thread is accessing the increment() method, other threads cannot access the increment() method. If a thread is accessing the decrement() method, other threads cannot access the decrement() method. If a thread is accessing the increment() method, other threads have to wait until the thread exits the increment() method. If a thread is accessing the decrement() method, other threads have to wait until the thread exits the decrement() method. + +Similarly, the **synchronized keyword** can be used to synchronize a block of code. Following is an example of a synchronized block: +```java +public class Count { + private int count = 0; + + public void increment() { + synchronized (this) { + count++; + } + } + + public void decrement() { + synchronized (this) { + count--; + } + } + + public int getCount() { + return count; + } +} +``` +If you declare a method as synchronized, only one thread will be able to access any synchronized method in the class. This is because the synchronized keyword is associated with the object. + +--- +### Coding Problem 1 - Thread Safe Counter (Homework) +// Implement a class that represents a counter and is accessed by multiple threads. +// Ensure that the counter is updated in a thread-safe manner without using the synchronized keyword. + +```java +public class ThreadSafeCounter { + private int count = 0; + + // TODO: Implement a thread-safe method to increment the counter. + + public static void main(String[] args) { + // TODO: Create multiple threads that concurrently increment the counter. + // Ensure that the counter is updated in a thread-safe manner without using the synchronized keyword. + } +} +``` + + +### Coding Problem 2 - Reentrantlock Basics (Homework) +Implement a program that uses ReentrantLock to achieve thread safety. + +```java +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; + +public class ReentrantLockExample { + private int value = 0; + private final Lock lock = new ReentrantLock(); + + // TODO: Implement a method to update the value using ReentrantLock. + + public static void main(String[] args) { + // TODO: Create multiple threads that concurrently update the value using ReentrantLock. + // Ensure that the value is updated in a thread-safe manner. + } +} +``` +--- +## Additonal Topics +### 1. Atomic Datatypes in java +In Java, the java.util.concurrent.atomic package provides a set of classes that support atomic operations on variables. These classes are designed to be thread-safe and eliminate the need for explicit synchronization in certain scenarios. One commonly used class is AtomicInteger. In this tutorial, we'll explore AtomicInteger and provide a simple code example. + +**Atomic Integer Basics** +The AtomicInteger class provides atomic operations on an integer variable. These operations are performed in a way that ensures atomicity, making them thread-safe without the need for explicit synchronization. + +Example Usage: + +```java +import java.util.concurrent.atomic.AtomicInteger; + +public class AtomicIntegerExample { + + public static void main(String[] args) { + // Create an AtomicInteger with an initial value + AtomicInteger atomicInteger = new AtomicInteger(0); + + // Perform atomic increment + int newValue = atomicInteger.incrementAndGet(); + System.out.println("Incremented Value: " + newValue); + + // Perform atomic decrement + newValue = atomicInteger.decrementAndGet(); + System.out.println("Decremented Value: " + newValue); + + // Perform atomic add + int addValue = 5; + newValue = atomicInteger.addAndGet(addValue); + System.out.println("After Adding " + addValue + ": " + newValue); + + // Perform compare-and-set operation + int expectedValue = 5; + int updateValue = 10; + boolean success = atomicInteger.compareAndSet(expectedValue, updateValue); + if (success) { + System.out.println("Value updated successfully. New Value: " + atomicInteger.get()); + } else { + System.out.println("Value was not updated. Current Value: " + atomicInteger.get()); + } + } +} +``` + +**Benefits of Atomic Datatypes:** +- Thread Safety: Operations on AtomicInteger are atomic, eliminating the need for explicit synchronization. +- Performance: Atomic operations are more efficient than using locks for simple operations on shared variables. +- Simplicity: Simplifies the development of thread-safe code in scenarios where simple atomic operations suffice. + +Lets use Atomic Integers in Adder-Subtracter Example. +**InventoryCounter.java** +```java +public class InventorCounter { + AtomicInteger counter = new AtomicInteger(0); +} +``` +**Adder.java** +```java +public class Adder implements Runnable{ + private InventorCounter ic; + + Adder(InventorCounter ic){ + this.ic = ic; + } + + @Override + public void run() { + for(int i=0;i<=10000;i++){ + ic.counter.addAndGet(1); + } + } +} +``` +**Subtracter.java** +```java +public class Subtracter implements Runnable{ + private InventorCounter ic; + + Subtracter(InventorCounter ic){ + this.ic = ic; + } + + @Override + public void run() { + for(int i=0;i<=10000;i++){ + ic.counter.addAndGet(-1); + } + } +} +``` +**Main.java** +```java +public class Main { + public static void main(String[] args) throws InterruptedException { + + InventorCounter ic = new InventorCounter(); + Thread t1 = new Thread(new Adder(ic)); + Thread t2 = new Thread(new Subtracter(ic)); + t1.start(); + t2.start(); + + t1.join(); + t2.join(); + + System.out.println(ic.counter.get()); + } +} +``` +In summary, the AtomicInteger class in Java provides a convenient and efficient way to perform atomic operations on integer variables, making it a valuable tool for concurrent programming. Similar classes, such as AtomicLong and AtomicBoolean, exist for other primitive types. + +### 2. Volatile Keyword +[Video Tutorial on Volatile](https://drive.google.com/drive/folders/1NpxU-yvk-sgiRrsmIz05vjy2Iz8gYMED?usp=sharing) + +Volatile Keyword solves for problems like Memory Inconsistency Errors & Data Races. Let's understand this in more detail. + + +The Operating system may read from heap variables, and make a copy of the value in each thread's own storage. Each threads has its own small and fast memory storage, that holds its own copy of shared resource's value. + +Once thread can modify a shared variable, but this change might not be immediately reflected or visible. Instead it is first update in thread's local cache. The operating system may not flush the first thread's changes to the heap, until the thread has finished executing, causing memory inconsistency errors. + +Lets see it through this code in action: +**SharedResourced.java** +```java +public class SharedResource { + volatile private boolean flag; + SharedResource(){ + flag = false; + } + //Two More Methods + public void toggleFlag(){ + flag = !flag; + } + public boolean getFlag(){ + return flag; + } +} +``` + +**Main.java** +```java +public class Main { + public static void main(String[] args) { + SharedResource sharedResource = new SharedResource(); + System.out.println("Shared Resource Created, Flag Value " + sharedResource.getFlag()); + + Thread A = new Thread(()->{ + //After 2S, toggle the value + try{ + Thread.sleep(2000); + } + catch(InterruptedException e){ + e.printStackTrace(); + } + sharedResource.toggleFlag(); + System.out.println("Thread A is finished, Flag is "+sharedResource.getFlag()); + }); + + Thread B = new Thread(()->{ + while(!sharedResource.getFlag()){ + //...busy-wait... + // System.out.println("Inside Loop " + sharedResource.getFlag()); + + } + System.out.println("In Thread B, Flag is "+sharedResource.getFlag()); + }); + + A.start(); + B.start(); + + + } +} +``` + + +**Solution - Volatile Keyword** +- The volatile keyword is used as modifier for class variables. +- It's an indicator that this variable's value may be changed by multiple threads. +- This modifier ensures that the variable is always read from, and written to the main memory, rather than from any thread-specific cache. +- This provides memory consistency for this variables value across threads. +Volatile doesn't gurantee atomicicty. + +However, volatile does not provide atomicity or synchronization, so additional synchronization mechanisms should be used in conjunction with it when necessary. + +**When to use volatile** +- When a variable is used to track the state of a shared resource, such as counter or a flag. +- When a varaible is used to communicate between threads. + +**When not use volatile** +- When the variable is used by single thread. +- When a variable is used to store a large amount of data. +### 3. Concurrent Data Structures +There are data structures designed in Collections Framework which support Concurrency but we will limit our discussions to one of the widely asked data structures - Concurrent Hashmap. +Java Collections provides various data structures for working with **key-value pairs**. The commonly used ones are - +- **Hashmap** (Non-Synchronised, Not Thread Safe) + - discuss the Synchronized Hashmap method + +- **Hashtable** (Synchronised, Thread Safe) + - locking over entire table + +- **Concurrent Hashmap** (Synchronised, Thread Safe, Higher Level of Concurrency, Faster) + - locking at bucket level, fine grained locking + +**Hashmap and Synchronised Hashmap Method** +Synchronization is the process of establishing coordination and ensuring proper communication between two or more activities. Since a HashMap is not synchronized which may cause data inconsistency, therefore, we need to synchronize it. The in-built method ‘Collections.synchronizedMap()’ is a more convenient way of performing this task. + +A synchronized map is a map that can be safely accessed by multiple threads without causing concurrency issues. On the other hand, a Hash Map is not synchronized which means when we implement it in a multi-threading environment, multiple threads can access and modify it at the same time without any coordination. This can lead to data inconsistency and unexpected behavior of elements. It may also affect the results of an operation. + +Therefore, we need to synchronize the access to the elements of Hash Map using ‘synchronizedMap()’. This method creates a wrapper around the original HashMap and locks it whenever a thread tries to access or modify it. + +```java +Collections.synchronizedMap(instanceOfHashMap); +``` + +The `synchronizedMap()` is a static method of the Collections class that takes an instance of HashMap collection as a parameter and returns a synchronized Map from it. However,it is important to note that only the map itself is synchronized, not its views such as keyset and entrySet. Therefore, if we want to iterate over the synchronized map, we need to use a synchronized block or a lock to ensure exclusive access. + +```java +import java.util.*; +public class Maps { + public static void main(String[] args) { + HashMap cart = new HashMap<>(); + // Adding elements in the cart map + cart.put("Butter", 5); + cart.put("Milk", 10); + cart.put("Rice", 20); + cart.put("Bread", 2); + cart.put("Peanut", 2); + // printing synchronized map from HashMap + Map mapSynched = Collections.synchronizedMap(cart); + System.out.println("Synchronized Map from HashMap: " + mapSynched); + } +} +``` + +**Hashtable vs Concurrent Hashmap** +HashMap is generally suitable for single threaded applications and is faster than Hashtable, however in multithreading environments we have you use **Hashtable** or **Concurrent Hashmap**. So let us talk about them. + +While both Hashtable and Concurrent Hashmap collections offer the advantage of thread safety, their underlying architectures and capabilities significantly differ. Whether we’re building a legacy system or working on modern, microservices-based cloud applications, understanding these nuances is critical for making the right choice. + +Let's see the differences between Hashtable and ConcurrentHashMap, delving into their performance metrics, synchronization features, and various other aspects to help us make an informed decision. + +**1. Hashtable** +Hashtable is one of the oldest collection classes in Java and has been present since JDK 1.0. It provides key-value storage and retrieval APIs: + +```java +Hashtable hashtable = new Hashtable<>(); +hashtable.put("Key1", "1"); +hashtable.put("Key2", "2"); +hashtable.putIfAbsent("Key3", "3"); +String value = hashtable.get("Key2"); +``` +**The primary selling point of Hashtable is thread safety, which is achieved through method-level synchronization**. + +Methods like put(), putIfAbsent(), get(), and remove() are synchronized. Only one thread can execute any of these methods at a given time on a Hashtable instance, ensuring data consistency. + +**2. Concurrent Hashmap** +ConcurrentHashMap is a more modern alternative, introduced with the Java Collections Framework as part of Java 5. + +Both Hashtable and ConcurrentHashMap implement the Map interface, which accounts for the similarity in method signatures: +```java +ConcurrentHashMap concurrentHashMap = new ConcurrentHashMap<>(); +concurrentHashMap.put("Key1", "1"); +concurrentHashMap.put("Key2", "2"); +concurrentHashMap.putIfAbsent("Key3", "3"); +String value = concurrentHashMap.get("Key2"); +``` + +ConcurrentHashMap, on the other hand, provides thread safety with a higher level of concurrency. It allows multiple threads to read and perform limited writes simultaneously **without locking the entire data structure**. This is especially useful in applications that have more read operations than write operations. + +**Performance Comparison** +Hashtable locks the entire table during a write operation, thereby preventing other reads or writes. This could be a bottleneck in a high-concurrency environment. + +ConcurrentHashMap, however, allows concurrent reads and limited concurrent writes, making it more scalable and often faster in practice. + + +--- +## Coding Projects on Synchronisatoin +### Coding Problem 3 - Ticket Booking System + +Consider an online reservation system for booking tickets to various events. The system needs to handle concurrent requests from multiple users trying to reserve seats. To ensure thread safety and prevent race conditions, a Reentrant Lock can be employed. +Requirements: +- The reservation system manages the availability of seats for different events. +- Multiple users can attempt to reserve seats concurrently. +- A user should be able to reserve multiple seats for the same event. +- The system should prevent overbooking and ensure the integrity of seat reservations. + + +```java +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; + +class ReservationSystem { + private int availableSeats; + private final Lock lock = new ReentrantLock(); + + public ReservationSystem(int totalSeats) { + this.availableSeats = totalSeats; + } + + public void reserveSeats(String user, int numSeats) { + lock.lock(); + try { + if (numSeats > 0 && numSeats <= availableSeats) { + // Simulate the reservation process + System.out.println(user + " is reserving " + numSeats + " seats."); + + // Update available seats + availableSeats -= numSeats; + + // Simulate the ticket issuance + System.out.println(user + " reserved seats successfully."); + } else { + System.out.println(user + " could not reserve seats. Not enough available seats."); + } + } finally { + lock.unlock(); + } + } + + public int getAvailableSeats() { + return availableSeats; + } +} + +public class OnlineReservationSystem { + public static void main(String[] args) { + ReservationSystem reservationSystem = new ReservationSystem(50); + + // Simulate multiple users trying to reserve seats concurrently + Thread user1 = new Thread(() -> reservationSystem.reserveSeats("User1", 5)); + Thread user2 = new Thread(() -> reservationSystem.reserveSeats("User2", 10)); + Thread user3 = new Thread(() -> reservationSystem.reserveSeats("User3", 8)); + + user1.start(); + user2.start(); + user3.start(); + + try { + user1.join(); + user2.join(); + user3.join(); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + + System.out.println("Remaining available seats: " + reservationSystem.getAvailableSeats()); + } +} +``` + +In this example, the ReservationSystem class utilizes a Reentrant Lock (lock) to ensure that the reservation process is thread-safe. The reserveSeats method is enclosed in a try-finally block to ensure that the lock is always released, even if an exception occurs. +This real-world problem demonstrates how Reentrant Locks can be used to synchronize access to shared resources in a multi-threaded environment, ensuring data consistency and preventing race conditions in a scenario like an online reservation system. + + +### Coding Problem 4 - Thread-safe Bank Transactions +**Problem Statement:** +You are tasked with implementing a simple bank system that supports concurrent transactions. The bank has multiple accounts, and customers can deposit and withdraw money from their accounts concurrently. Implement a program that ensures the integrity of bank transactions by using threads. +**Requirements:** +- Each account has a unique account number and an initial balance. +- Customers can concurrently deposit and withdraw money from their accounts. +- The bank should ensure that the account balance remains consistent and does not go below zero during concurrent transactions. +- Use threads to simulate multiple customers performing transactions simultaneously. + +**Solution** + +```java +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; + +class BankAccount { + private final int accountNumber; + private int balance; + private final Lock lock = new ReentrantLock(); + + + public BankAccount(int accountNumber, int initialBalance) { + this.accountNumber = accountNumber; + this.balance = initialBalance; + } + + + public int getAccountNumber() { + return accountNumber; + } + + + public int getBalance() { + return balance; + } + + + public void deposit(int amount) { + lock.lock(); + try { + balance += amount; + System.out.println("Deposited $" + amount + " to account " + accountNumber + ". New balance: $" + balance); + } finally { + lock.unlock(); + } + } + + + public void withdraw(int amount) { + lock.lock(); + try { + if (amount <= balance) { + balance -= amount; + System.out.println("Withdrawn $" + amount + " from account " + accountNumber + ". New balance: $" + balance); + } else { + System.out.println("Insufficient funds for withdrawal from account " + accountNumber); + } + } finally { + lock.unlock(); + } + } +} + + +class BankTransaction implements Runnable { + private final BankAccount account; + private final int transactionAmount; + + + public BankTransaction(BankAccount account, int transactionAmount) { + this.account = account; + this.transactionAmount = transactionAmount; + } + + + @Override + public void run() { + // Simulate a bank transaction (deposit or withdrawal) + if (transactionAmount >= 0) { + account.deposit(transactionAmount); + } else { + account.withdraw(Math.abs(transactionAmount)); + } + } +} + + +public class BankSimulation { + public static void main(String[] args) { + BankAccount account1 = new BankAccount(101, 1000); + BankAccount account2 = new BankAccount(102, 1500); + + + // Simulate concurrent bank transactions using threads + Thread thread1 = new Thread(new BankTransaction(account1, 200)); + Thread thread2 = new Thread(new BankTransaction(account1, -300)); + Thread thread3 = new Thread(new BankTransaction(account2, 500)); + + + thread1.start(); + thread2.start(); + thread3.start(); + + + try { + thread1.join(); + thread2.join(); + thread3.join(); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + + + // Display final account balances + System.out.println("Final balance for account " + account1.getAccountNumber() + ": Rs" + account1.getBalance()); + System.out.println("Final balance for account " + account2.getAccountNumber() + ": Rs" + account2.getBalance()); + } +} +``` + +In this example, BankAccount represents a bank account with deposit and withdraw methods protected by a ReentrantLock. The BankTransaction class simulates a bank transaction (deposit or withdrawal), and the BankSimulation class demonstrates how threads can be used to perform concurrent transactions on multiple accounts. The use of locks ensures the thread safety of the bank transactions + +## Additonal Reading +- [Reentrant Locks](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html) +- [Concurrent DataStructures](https://docs.oracle.com/javase/tutorial/essential/concurrency/collections.html) +- [Fairness of Re-entrant Locks](https://docs.oracle.com/javase%2F7%2Fdocs%2Fapi%2F%2F/java/util/concurrent/locks/ReentrantLock.html) + + +-- End -- + + diff --git a/Non-DSA Notes/LLD1 Notes/Concurrency 04 - Synchronisation using Semaphores, Deadlocks.md b/Non-DSA Notes/LLD1 Notes/Concurrency 04 - Synchronisation using Semaphores, Deadlocks.md new file mode 100644 index 0000000..711ba10 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Concurrency 04 - Synchronisation using Semaphores, Deadlocks.md @@ -0,0 +1,865 @@ +# Concurrency-4 Synchronization with Semaphores +---- +## Agenda +- Synchronisation using Semaphores + - Producer Consumer Problem using Semaphores + - Producer Consumer Problem using Concurrent Data Structure (Queue) + - Print In Order LeetCode Problem +- Deadlocks +- Additional Topics + - wait(), notify() methods + - Producer Consumer Using wait() & notify() +- Coding Projects (Optional) + - Traffic Intersection Control + - Resource Pooling in Library +- LeetCode Problems +- Additional Resources + +## Synchronisation using Semaphores +### Producer Consumer Problem : A T-Shirt Store Example + +The Producer-Consumer problem is a classic synchronization problem where two processes, the producer and the consumer, share a common, fixed-size buffer or store. The producer produces items and adds them to the buffer, while the consumer consumes items from the buffer. Semaphores are synchronization primitives that can be used to solve this problem efficiently. + +**Problem Description** +Let's use a T-shirt store as an analogy for the Producer-Consumer problem. The T-shirt store has a limited capacity to store T-shirts. Producers can create T-shirts and add them to the store, and consumers can buy T-shirts from the store. The challenge is to ensure that the store doesn't overflow with T-shirts or run out of stock. + + + +**Java Implementation-1 Using Semaphores** +(simplified implementation than what is covered in class, here we don't maintain an actual queue for T-Shirts, just the count) + +```java +import java.util.concurrent.Semaphore; + +public class TShirtStore { + private static final int STORE_CAPACITY = 5; + private static Semaphore mutex = new Semaphore(1); // Controls access to critical sections + private static Semaphore empty = new Semaphore(STORE_CAPACITY); // Represents empty slots in the store + private static Semaphore full = new Semaphore(0); // Represents filled slots in the store + private static int tShirtCount = 0; + + static class Producer implements Runnable { + @Override + public void run() { + try { + while (true) { + empty.acquire(); // Wait for an empty slot + mutex.acquire(); // Enter critical section + + // Produce a T-shirt + System.out.println("Producer produces a T-shirt. Total T-shirts: " + ++tShirtCount); + + mutex.release(); // Exit critical section + full.release(); // Signal that a T-shirt is ready to be consumed + Thread.sleep(1000); // Simulate production time + } + } catch (InterruptedException e) { + e.printStackTrace(); + } + } + } + + static class Consumer implements Runnable { + @Override + public void run() { + try { + while (true) { + full.acquire(); // Wait for a T-shirt to be available + mutex.acquire(); // Enter critical section + + // Consume a T-shirt + System.out.println("Consumer buys a T-shirt. Total T-shirts: " + --tShirtCount); + + mutex.release(); // Exit critical section + empty.release(); // Signal that a slot is available for production + Thread.sleep(1500); // Simulate consumption time + } + } catch (InterruptedException e) { + e.printStackTrace(); + } + } + } + + public static void main(String[] args) { + Thread producerThread = new Thread(new Producer()); + Thread consumerThread = new Thread(new Consumer()); + + producerThread.start(); + consumerThread.start(); + } +} +``` + +Semaphores + +- **mutex**: Controls access to the critical sections (mutex stands for mutual exclusion). +- **empty**: Represents the number of empty slots in the store, initially set to the store's capacity. +- **full**: Represents the number of filled slots in the store, initially set to 0. + + +**Producer:** + +- The producer acquires an empty slot using empty.acquire() and enters the critical section with mutex.acquire(). + +- It produces a T-shirt, increments the count, releases the mutex, and signals that a T-shirt is ready for consumption using full.release(). + +**Consumer:** + +- The consumer acquires a filled slot using full.acquire() and enters the critical section with mutex.acquire(). +- It consumes a T-shirt, decrements the count, releases the mutex, and signals that an empty slot is available for production using empty.release(). + +**Simulated Production and Consumption:** +`Thread.sleep()` is used to simulate the time it takes to produce and consume T-shirts. +**Execution**: When you run this program, you will observe the producer producing T-shirts and the consumer buying T-shirts. The store's capacity is maintained, and semaphores ensure proper synchronization between the producer and the consumer. + +This example demonstrates how semaphores can be used to solve the Producer-Consumer problem efficiently, preventing issues such as overproduction or stockouts. + +**Java Implementation -2 using Semaphores** +**Producer.java** +```java +public class Producer implements Runnable{ + private Queue queue; + private int maxSize; + private String name; + private Semaphore producerSemaphore; + private Semaphore consumerSemaphore; + + Producer(Queue queue, int maxSize, String name, Semaphore ps, Semaphore cs){ + this.queue = queue; + this.maxSize = maxSize; + this.name = name; + this.producerSemaphore = ps; + this.consumerSemaphore = cs; + } + @Override + public void run() { + while(true){ + try { + producerSemaphore.acquire(); + } catch (InterruptedException e) { + throw new RuntimeException(e); + } + if(queue.size() queue; + private int maxSize; + private String name; + private Semaphore producerSemaphore; + private Semaphore consumerSemaphore; + + Consumer(Queue queue, int maxSize, String name, Semaphore ps, Semaphore cs){ + this.queue = queue; + this.maxSize = maxSize; + this.name = name; + this.producerSemaphore = ps; + this.consumerSemaphore = cs; + } + + @Override + public void run() { + while(true){ + try { + consumerSemaphore.acquire(); + } catch (InterruptedException e) { + throw new RuntimeException(e); + } + if(queue.size()>0){ + System.out.println(this.name + " removing from queue, Size " + queue.size()); + queue.remove(); + } + producerSemaphore.release(); + } + } +} +``` + + +**Client.java** + +Here we create multiple producers and multiple consumers. +```java +public class Client { + public static void main(String[] args) { + Queue objects = new ConcurrentLinkedQueue<>(); + int maxSize = 6; + Semaphore producerSemaphore = new Semaphore(maxSize); + Semaphore consumerSemaphore = new Semaphore(0); + + Producer p1 = new Producer(objects,6,"p1",producerSemaphore,consumerSemaphore); + Producer p2 = new Producer(objects,6,"p2",producerSemaphore,consumerSemaphore); + Producer p3 = new Producer(objects,6,"p3",producerSemaphore,consumerSemaphore); + + Consumer c1 = new Consumer(objects,6,"c1",producerSemaphore,consumerSemaphore); + Consumer c2 = new Consumer(objects,6,"c2",producerSemaphore,consumerSemaphore); + Consumer c3 = new Consumer(objects,6,"c3",producerSemaphore,consumerSemaphore); + Consumer c4 = new Consumer(objects,6,"c4",producerSemaphore,consumerSemaphore); + Consumer c5 = new Consumer(objects,6,"c5",producerSemaphore,consumerSemaphore); + + Thread t1 = new Thread(p1); + Thread t2 = new Thread(p2); + Thread t3 = new Thread(p3); + Thread t4 = new Thread(c1); + Thread t5 = new Thread(c2); + Thread t6 = new Thread(c3); + Thread t7 = new Thread(c4); + Thread t8 = new Thread(c5); + + t1.start(); + t2.start(); + t3.start(); + t4.start(); + t5.start(); + t6.start(); + t7.start(); + } +} +``` +**Java Implementation -3 using Concurrent Data Structure** +Here is another implementation in which you can use a ConcurrentLinkedQueue in place of semaphores to ensure concurrency is handled well. +**Producer.java** +```java +import java.util.Queue; + +public class Producer implements Runnable{ + private Queue queue; + int maxSize; + String name; + + public Producer(Queue queue,int maxSize, String name){ + this.queue = queue; + this.maxSize = maxSize; + this.name = name; + } + + @Override + public void run() { + //Each producer wants to continuously produces + // T-Shirts and add them to the queue if there is space available + + while(true){ + synchronized (queue){ + if(queue.size() queue; + int maxSize; + String name; + + public Consumer(Queue queue,int maxSize, String name){ + this.queue = queue; + this.maxSize = maxSize; + this.name = name; + } + + @Override + public void run() { + //Each producer wants to continuously produces + // T-Shirts and add them to the queue if there is space available + while(true){ + synchronized (queue) { + if (queue.size() > 0) { + System.out.println("Removing - "+ queue.size()); + queue.remove(); + } + } + } + } +} +``` + +**Main.java** +```java +public class Main { + public static void main(String[] args) { + //Shared Object + Queue q = new ConcurrentLinkedQueue<>(); + int maxSize = 6; + + Producer p1 = new Producer(q,maxSize,"p1"); + Producer p2= new Producer(q,maxSize,"p2"); + Producer p3 = new Producer(q,maxSize,"p3"); + + Consumer c1 = new Consumer(q,maxSize,"c1"); + Consumer c2 = new Consumer(q,maxSize,"c2"); + Consumer c3 = new Consumer(q,maxSize,"c3"); + Consumer c4 = new Consumer(q,maxSize,"c4"); + Consumer c5 = new Consumer(q,maxSize,"c5"); + + Thread t1 = new Thread(p1); + Thread t2 = new Thread(p2); + Thread t3 = new Thread(p3); + Thread t4 = new Thread(c1); + Thread t5 = new Thread(c2); + Thread t6 = new Thread(c3); + Thread t7 = new Thread(c4); + Thread t8 = new Thread(c5); + + t1.start(); + t2.start(); + t3.start(); + t4.start(); + t5.start(); + t6.start(); + t7.start(); + t8.start(); + } +} +``` + +### Time To Try - Print In Order (LeetCode) +Try to solve the following problem using Semaphores Concept. +- [Print In Order - LeetCode](https://leetcode.com/problems/print-in-order/description/) + +**Solution** +```java +class Foo { + Semaphore semaSecond = new Semaphore(0); + Semaphore semaThird = new Semaphore(0); + public Foo() { + + } + public void first(Runnable printFirst) throws InterruptedException { + printFirst.run(); + semaSecond.release(); + } + public void second(Runnable printSecond) throws InterruptedException { + semaSecond.acquire(); + printSecond.run(); + semaThird.release(); + } + public void third(Runnable printThird) throws InterruptedException { + semaThird.acquire(); + printThird.run(); + } +} +``` +## DeadLocks + +A deadlock in OS is a situation in which more than one process is blocked because it is holding a resource and also requires some resource that is acquired by some other process. + +### Conditions for a deadlock +* `Mutual exclusion` - The resource is held by only one process at a time and cannot be acquired by another process. +* `Hold and wait` - A process is **holding** a resource and **waiting** for another resource to be released by another a process. +* `No preemption` - The resource can only be released once the execution of the process is complete. +* `Circular wait` - A set of processes are waiting for each other circularly. Process `P1` is waiting for process `P2` and process `P2` is waiting for process `P1`. + +![Deadlock](https://scaler.com/topics/images/deadlock-in-os-image1.webp) + +Process P1 and P2 are in a deadlock because: +* Resources are non-shareable. (Mutual exclusion) +* Process 1 holds "Resource 1" and is waiting for "Resource 2" to be released by process 2. (Hold and wait) +* None of the processes can be preempted. (No preemption) +* "Resource 1" and needs "Resource 2" from Process 2 while Process 2 holds "Resource 2" and requires "Resource 1" from Process 1. (Circular wait) + +### Tackling deadlocks + +There are three ways to tackle deadlocks: +* Prevention - Implementing a mechanism to prevent the deadlock. +* Avoidance - Avoiding deadlocks by not allocating resources when deadlocks are possible. +* Detecting and recovering - Detecting deadlocks and recovering from them. +* Ignorance - Ignore deadlocks as they do not happen frequently. + +#### 1. Prevention and avoidance + +Deadlock prevention means to block at least one of the four conditions required for deadlock to occur. If we are able to block any one of them then deadlock can be prevented. Spooling and non-blocking synchronization algorithms are used to prevent the above conditions. In deadlock prevention all the requests are granted in a finite amount of time. + +In Deadlock avoidance we have to anticipate deadlock before it really occurs and ensure that the system does not go in unsafe state.It is possible to avoid deadlock if resources are allocated carefully. For deadlock avoidance we use Banker’s and Safety algorithm for resource allocation purpose. In deadlock avoidance the maximum number of resources of each type that will be needed are stated at the beginning of the process. + +#### 2. Detecting and recovering from deadlocks + +We let the system fall into a deadlock and if it happens, we detect it using a detection algorithm and try to recover. + +Some ways of recovery are as follows: + +* Aborting all the deadlocked processes. +* Abort one process at a time until the system recovers from the deadlock. +* Resource Preemption: Resources are taken one by one from a process and assigned to higher priority processes until the deadlock is resolved. + +#### 3. Ignorance + +The system assumes that deadlock never occurs. Since the problem of deadlock situation is not frequent, some systems simply ignore it. Operating systems such as UNIX and Windows follow this approach. However, if a deadlock occurs we can reboot our system and the deadlock is resolved automatically. + +#### 4. Tackling deadlocks at an application level +* Set timeouts for all the processes. If a process does not respond within the timeout period, it is killed. +* Implementing with caution: Use interfaces that handle or provide callbacks if locks are held by other processes. +* Add timeout to locks: If a process requests a lock, and it is held by another process, it will wait for the lock to be released until the timeout expires. + +## Additonal Topics +### Inter-thread Communication using wait() and notify() +[wait() & notify() - Recording Link](https://www.scaler.com/meetings/i/backend-lld-concurrency-callables-continued-3/archive) +Certainly! In Java, the wait() and notify() methods are part of the built-in mechanism for inter-thread communication and synchronization. These methods are used to coordinate the activities of multiple threads, allowing them to work together effectively. Let's break down these concepts for beginners: + + +#### wait() Method: +- The wait() method is called on an object within a synchronized context (i.e., within a method or block synchronized on that object). +- It causes the current thread to release the lock on the object and enter a state of waiting. +Purpose: +- wait() is used when a thread needs to wait for a certain condition to be met before proceeding. + +For example, if a thread is waiting for a shared resource to be available, it can call wait() until another thread notifies it that the resource is ready. +Example: + +```java +synchronized (sharedObject) { + while (!conditionMet) { + try { + sharedObject.wait(); // Releases the lock and waits for notification + } catch (InterruptedException e) { + e.printStackTrace(); + } + } + // Continue with the critical section +} +``` +#### notify() Method: +- The notify() method is called on an object within a synchronized context. +- It wakes up one of the threads that are currently waiting on that object. +Purpose: + +- notify() is used to signal that a condition (for which threads are waiting) has been met and that one of the waiting threads can proceed. +- It is essential to note that notify() only wakes up one waiting thread. If there are multiple waiting threads, it is not determined which one will be awakened. +Example: + +```java +synchronized (sharedObject) { + // Perform some operations and change the condition + conditionMet = true; + + // Notify one of the waiting threads + sharedObject.notify(); +} +``` + +**Important Points:** +- Both wait() and notify() must be called within a synchronized context to avoid illegal monitor state exceptions. +- The calling thread must hold the lock on the object on which it is calling wait() or notify(). +- The wait() method releases the lock, allowing other threads to access the synchronized block or method. +- The notify() method signals a waiting thread to wake up, allowing it to reacquire the lock and continue execution. + +**Example Scenario:** +Consider a scenario where multiple threads are working on a shared resource. If a thread finds that the resource is not yet available (e.g., a buffer is empty), it can call wait() to release the lock and wait until another thread populates the buffer and calls notify() to signal that the resource is ready for consumption. + +In summary, wait() and notify() are fundamental methods for thread synchronization in Java, enabling threads to communicate and coordinate their activities efficiently. + +**Producer Consumer using Wait() and Notify()** +[Recording Link](https://www.scaler.com/meetings/i/backend-lld-concurrency-callables-continued-3/archive) + +**Implementation -1 (Simplified)** +Here is a simplifed version of Producer Consumer as discussed in above LIVE Class. +**ProducerConsumer.java** +```java +public class ProducerConsumer { + + public void produce() throws InterruptedException { + synchronized (this){ + System.out.println("Produced - T-shirt"); + //release the lock on the shared resource and wait till some other invokes this + wait(); + System.out.println("Going to produce another T-Shirt"); + } + } + + public void consume() throws InterruptedException { + Thread.sleep(1000); + Scanner sc = new Scanner(System.in); + + synchronized (this) { + System.out.println("Take t-shirt? "); + sc.nextLine(); + System.out.println("Recieved T-shirt"); + notify(); + Thread.sleep(3000); + } + } +} +``` + +**PCDemo.java** +```java +public class PCDemo { + public static void main(String[] args) { + ProducerConsumer pc = new ProducerConsumer(); + + Thread t1 = new Thread(()->{ + try { + pc.produce(); + } catch (InterruptedException e) { + e.printStackTrace(); + } + }); + Thread t2 = new Thread(()->{ + try { + pc.consume(); + } catch (InterruptedException e) { + e.printStackTrace(); + } + }); + + t1.start(); + t2.start(); + } +} +``` + +**Implementation-2** +A more robust Implementation is as follows: +```java +import java.util.LinkedList; + +class SharedBuffer { + private final LinkedList buffer = new LinkedList<>(); + private final int capacity; + + public SharedBuffer(int capacity) { + this.capacity = capacity; + } + + public void produce() throws InterruptedException { + synchronized (this) { + while (buffer.size() == capacity) { + // Buffer is full, wait for consumer to consume + wait(); + } + + // Produce an item and add to the buffer + int newItem = (int) (Math.random() * 100); + buffer.add(newItem); + System.out.println("Produced: " + newItem); + + // Notify the consumer that an item is available + notify(); + } + } + + public void consume() throws InterruptedException { + synchronized (this) { + while (buffer.isEmpty()) { + // Buffer is empty, wait for producer to produce + wait(); + } + + // Consume an item from the buffer + int consumedItem = buffer.removeFirst(); + System.out.println("Consumed: " + consumedItem); + + // Notify the producer that a slot is available in the buffer + notify(); + } + } +} + +class Producer implements Runnable { + private final SharedBuffer sharedBuffer; + + public Producer(SharedBuffer sharedBuffer) { + this.sharedBuffer = sharedBuffer; + } + + @Override + public void run() { + try { + while (true) { + sharedBuffer.produce(); + Thread.sleep(1000); // Simulate production time + } + } catch (InterruptedException e) { + e.printStackTrace(); + } + } +} + +class Consumer implements Runnable { + private final SharedBuffer sharedBuffer; + + public Consumer(SharedBuffer sharedBuffer) { + this.sharedBuffer = sharedBuffer; + } + + @Override + public void run() { + try { + while (true) { + sharedBuffer.consume(); + Thread.sleep(1500); // Simulate consumption time + } + } catch (InterruptedException e) { + e.printStackTrace(); + } + } +} + +public class ProducerConsumerExample { + public static void main(String[] args) { + SharedBuffer sharedBuffer = new SharedBuffer(5); + + Thread producerThread = new Thread(new Producer(sharedBuffer)); + Thread consumerThread = new Thread(new Consumer(sharedBuffer)); + + producerThread.start(); + consumerThread.start(); + } +} +``` +In this example: + +- `SharedBuffer` is the shared buffer where the producer produces and the consumer consumes items. +- The Producer class produces items and adds them to the buffer. +- The Consumer class consumes items from the buffer. +- The main method creates instances of the shared buffer, producer, and consumer, and starts their respective threads. +- This solution uses the `wait()` and `notify()` methods to ensure that the producer waits when the buffer is full and the consumer waits when the buffer is empty, allowing for proper coordination and synchronization between the two threads. + +## Coding Projects +### Coding Problem 1 : Traffic Intersection Control +Implement a program that simulates a traffic intersection control system using Java Semaphores. The intersection has two roads, each with its own traffic signal. The traffic lights control the flow of traffic through the intersection. +**Requirements:** +- There are two roads, Road A and Road B, crossing at the intersection. +- Each road has its own traffic signal (Semaphore) controlling the traffic flow. +- The traffic lights for Road A and Road B have a cycle of Green, Yellow, and Red signals. +Only one road should have a green light at a time, while the other road has a red light. +The intersection should allow a smooth transition between green lights for both roads. + +**Constraints:** +- The time duration for each signal (Green, Yellow, Red) can be adjusted based on the program's design. +- Use Semaphores to control access to the traffic signals and ensure a safe transition. +Each road signal should run in a separate thread. +- Implement a way to visually represent the current state of the traffic signals and indicate which road has the green light. + + +Sample Output: + +Road A: Green +Road B: Red + +[... Some time passes ...] + +Road A: Yellow +Road B: Red + +[... Some time passes ...] + +Road A: Red +Road B: Green + + +**Notes:** +- The program should demonstrate proper synchronization to ensure that only one road has a green light at any given time. +- You may choose to implement additional features such as a pedestrian signal or a button for road switching. +- Consider the safety and efficiency of the intersection control system. +- This problem statement reflects a real-world scenario where semaphores can be used to control access to shared resources (in this case, the green light for each road) in a concurrent environment. Students can implement the solution to gain hands-on experience with semaphore-based synchronization in a practical setting. + + +**Solution:** + +```java +import java.util.concurrent.Semaphore; + +class TrafficIntersectionControl { + private Semaphore roadASemaphore = new Semaphore(1); // Semaphore for Road A's traffic signal + private Semaphore roadBSemaphore = new Semaphore(0); // Semaphore for Road B's traffic signal + + // Simulate time passing + private void sleep(int seconds) { + try { + Thread.sleep(seconds * 1000); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } + + // Switch the traffic lights for Road A and Road B + private void switchLights() { + System.out.println("Switching lights..."); + + try { + roadASemaphore.acquire(); // Acquire the semaphore for Road A + roadBSemaphore.release(); // Release the semaphore for Road B + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } + + // Simulate traffic on Road A + private void trafficOnRoadA() { + while (true) { + System.out.println("Road A: Green"); + sleep(5); // Green light duration + + System.out.println("Road A: Yellow"); + sleep(2); // Yellow light duration + + System.out.println("Road A: Red"); + switchLights(); // Switch to Road B + } + } + + // Simulate traffic on Road B + private void trafficOnRoadB() { + while (true) { + try { + roadBSemaphore.acquire(); // Acquire the semaphore for Road B + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + + System.out.println("Road B: Green"); + sleep(5); // Green light duration + + System.out.println("Road B: Yellow"); + sleep(2); // Yellow light duration + + System.out.println("Road B: Red"); + switchLights(); // Switch to Road A + } + } + + public static void main(String[] args) { + TrafficIntersectionControl control = new TrafficIntersectionControl(); + + // Create and start threads for traffic on Road A and Road B + Thread roadAThread = new Thread(control::trafficOnRoadA); + Thread roadBThread = new Thread(control::trafficOnRoadB); + + roadAThread.start(); + roadBThread.start(); + } +} +``` + +#### Coding Problem - 2 Real-Life Problem Statement: Resource Pooling in a Library +Imagine a library that has a collection of books, and multiple students want to borrow and return books from the library. Implement a program that uses Java Semaphores to control access to the books, ensuring that the available resources (books) are used efficiently. + +**Requirements:** +The library has a fixed number of books (resources) available for borrowing. +Students can borrow books from the library and return them after reading. +The library enforces a limit on the maximum number of students who can borrow books simultaneously. +When a student returns a book, another student can borrow it if there is an available slot. + +**Constraints:** +Use Semaphores to control access to the shared resource (books). +Each student should run in a separate thread. +The program should handle the borrowing and returning of books concurrently. +Implement a way to visually represent the current state of the library, indicating which books are borrowed and available. + +**Example Output:** + +Student 1 borrows Book A +Library: [Book A is borrowed, Book B is available, Book C is available] + +Student 2 borrows Book B +Library: [Book A is borrowed, Book B is borrowed, Book C is available] + +Student 1 returns Book A +Library: [Book A is available, Book B is borrowed, Book C is available] + + +… +**Notes:** +Ensure that the library's state is properly synchronized to avoid race conditions. +You may choose to implement additional features, such as a waitlist for students. +Consider scenarios where a student may need to wait if all books are currently borrowed. +This problem statement reflects a real-world scenario where semaphores can be used to control access to a limited set of resources, ensuring that they are utilized efficiently and concurrently by multiple entities. Students can implement the solution to gain practical experience with semaphores in a resource pooling scenario. + +**Sample Code** +```java +import java.util.concurrent.Semaphore; + +class Library { + private static final int MAX_STUDENTS = 2; + private static final int MAX_BOOKS = 3; + + private Semaphore availableBooks = new Semaphore(MAX_BOOKS, true); + private Semaphore studentSlots = new Semaphore(MAX_STUDENTS, true); + + // Simulate time passing + private void sleep(int seconds) { + try { + Thread.sleep(seconds * 1000); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } + + // Borrow a book from the library + private void borrowBook(String book, int studentId) { + try { + availableBooks.acquire(); // Acquire a book + System.out.println("Student " + studentId + " borrows " + book); + sleep(2); // Simulate reading time + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } + + // Return a book to the library + private void returnBook(String book, int studentId) { + System.out.println("Student " + studentId + " returns " + book); + availableBooks.release(); // Release the returned book + } + + // Simulate a student using the library + private void student(int studentId) { + while (true) { + try { + studentSlots.acquire(); // Acquire a student slot + String bookToBorrow = "Book " + (studentId % MAX_BOOKS + 1); + borrowBook(bookToBorrow, studentId); + returnBook(bookToBorrow, studentId); + studentSlots.release(); // Release the student slot + sleep(1); // Wait before the next operation + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } + } + + public static void main(String[] args) { + Library library = new Library(); + + // Create and start multiple threads for students using the library + for (int i = 1; i <= MAX_STUDENTS; i++) { + int finalI = i; + new Thread(() -> library.student(finalI)).start(); + } + } +} +``` + +## LeetCode Multithreading Problems (HomeWork) +- [Dining Philoshpers - LeetCode](https://leetcode.com/problems/the-dining-philosophers/) +- [Fizz Buzz Multithreaded - LeetCode](https://leetcode.com/problems/fizz-buzz-multithreaded/) +- [Building H20 - LeetCode]( https://leetcode.com/problems/building-h2o/) +- [Traffic Light Control](https://leetcode.com/problems/traffic-light-controlled-intersection/description/) + +## Additonal Resources +- [Udemy - Concurrency, Multithreading and Parallel Computing in Java](https://www.udemy.com/course/multithreading-and-parallel-computing-in-java/) +- [Memory Management in Operating Systems - Thrasing, Paging etc Interview Topics](https://github.com/kanmaytacker/fundamentals/blob/master/os/notes/04-memory-management.md) + +-- End -- + + diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Asynchronous Programming vs Multithreading.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Asynchronous Programming vs Multithreading.md new file mode 100644 index 0000000..afc2475 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Asynchronous Programming vs Multithreading.md @@ -0,0 +1,172 @@ +# Asynchronous Programming vs Multithreading +--- + + +## Asynchronous Programming +Asynchronous programming is a programming paradigm that allows tasks to be executed independently without blocking the main thread. It focuses on managing the flow of the program by handling tasks concurrently and efficiently. It's commonly used to improve the responsiveness of applications by avoiding long-running operations that might otherwise cause the user interface to freeze. Java provides several mechanisms for asynchronous programming, and in this tutorial, we'll cover the basics using - Threads, CompletableFuture and the ExecutorService. + +### 1. Threads +We can create a new thread to perform any operation asynchronously. With the release of lambda expressions in Java 8, it’s cleaner and more readable. + +Let’s create a new thread that computes and prints the factorial of a number: +```java +public class ThreadExample { + public static int factorial(int n){ + + System.out.println(Thread.currentThread().getName() + "is running"); + + try{ + Thread.sleep(2000); + } + catch(InterruptedException e){ + e.printStackTrace(); + } + + int ans=1; + for(int i=1;i{ + System.out.println("Factorial of 5 " + factorial(number)); + }); + newThread.start(); + System.out.println("Main is still running-1"); + } +} +``` + +### 2. FutureTask +Since Java 5, the Future interface provides a way to perform asynchronous operations using the FutureTask. We can use the submit method of the ExecutorService to perform the task asynchronously and return the instance of the FutureTask. +```java +ExecutorService threadpool = Executors.newCachedThreadPool(); +Future futureTask = threadpool.submit(() -> factorial(number)); + +while (!futureTask.isDone()) { + System.out.println("FutureTask is not finished yet..."); +} +long result = futureTask.get(); //Blocking Code +threadpool.shutdown(); +``` +Here we’ve used the `isDone` method provided by the Future interface to check if the task is completed. Once finished, we can retrieve the result using the get method. + +### 3. CompletableFuture +CompletableFuture is a class introduced in Java 8 that provides a way to perform asynchronous operations and handle their results using a fluent API. Java 8 introduced CompletableFuture with a combination of a Future and CompletionStage. It provides various methods like supplyAsync, runAsync, and thenApplyAsync for asynchronous programming. + +**Example-1** +```java +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.ExecutionException; + +public class CompletableFutureExample { + public static void main(String[] args) { + // Create a CompletableFuture + CompletableFuture future = CompletableFuture.supplyAsync(() -> { + // Simulate a time-consuming task + try { + Thread.sleep(2000); + } catch (InterruptedException e) { + e.printStackTrace(); + } + return "Hello, CompletableFuture!"; + }); + + // Attach a callback to handle the result + future.thenAccept(result -> System.out.println("Result: " + result)); + + // Wait for the CompletableFuture to complete (not recommended in real applications) + try { + future.get(); + } catch (InterruptedException | ExecutionException e) { + e.printStackTrace(); + } + } +} +``` +In this example, we use `CompletableFuture.supplyAsync` to perform a task asynchronously. The thenAccept method is used to attach a callback that will be executed when the asynchronous task completes. + +**Example-2** +``` +CompletableFuture completableFuture = CompletableFuture.supplyAsync(() -> factorial(number)); +while (!completableFuture.isDone()) { + System.out.println("CompletableFuture is not finished yet..."); +} +long result = completableFuture.get(); +``` +We don’t need to use the ExecutorService explicitly. The CompletableFuture internally uses ForkJoinPool to handle the task asynchronously. Thus, it makes our code a lot cleaner. + + +### Uses of Asynchronous Programming: + +- IO-Intensive Operations: Asynchronous programming is often used for tasks that involve waiting for external resources, such as reading from or writing to files, making network requests, or interacting with databases. +- Responsive UI: In GUI applications, asynchronous programming helps in maintaining a responsive user interface by executing time-consuming tasks in the background. + +- Callback Mechanism: Asynchronous programming often uses callbacks or combinators to specify what should happen once a task is complete. + +- Composability: It emphasizes composability, allowing developers to chain together multiple asynchronous operations. + +```java +CompletableFuture future = CompletableFuture.supplyAsync(() -> "Hello, CompletableFuture!"); + +// Attach a callback to handle the result +future.thenAccept(result -> System.out.println("Result: " + result)); +``` + +### Multi-threading +Multithreading involves the concurrent execution of two or more threads to achieve parallelism. It is a fundamental concept for optimizing CPU-bound tasks and improving overall system performance. Key Components for achieving multithreading in Java are thread class and Executor Framework. + +- Thread Class: Java provides the Thread class for creating and managing threads. +- Executor Framework: The ExecutorService and related interfaces offer a higher-level abstraction for managing thread pools. + +### Use Case of Multithreading: + +- CPU-Intensive Operations: Multithreading is suitable for tasks that are CPU-bound and can benefit from parallel execution, such as mathematical computations. + +- Parallel Processing: Multithreading can be used to perform multiple tasks simultaneously, making efficient use of available CPU cores. + +### Shared State and Synchronization: + +- Shared State: In multithreading, threads may share data, leading to potential issues like race conditions and data corruption. + +- Synchronization: Techniques like synchronization, locks, and atomic operations are used to ensure proper coordination between threads. + +Example using ExecutorService: +```java +ExecutorService executorService = Executors.newFixedThreadPool(2); + +// Submit a task for execution +Future future = executorService.submit(() -> "Hello, ExecutorService!"); + +// Retrieve the result when ready +try { + String result = future.get(); // This will block until the result is available + System.out.println("Result: " + result); +} catch (Exception e) { + e.printStackTrace(); +} + + +// Shutdown the ExecutorService +executorService.shutdown(); +``` + +### Summary +### Asynchronous Programming: +- Focuses on non-blocking execution. +- Primarily used for IO-bound tasks and maintaining responsive applications. +- Utilizes higher-level abstractions like CompletableFuture. +- Emphasizes composability and chaining of asynchronous operations. + +### Multithreading: +- Focuses on parallelism for CPU-bound tasks. +- Suitable for tasks that can be executed concurrently. +- Utilizes threads and thread pools, managed by the Thread class and ExecutorService. +- Requires attention to synchronization and shared state management. + +In some scenarios, asynchronous programming and multithreading can be used together to achieve both parallelism and non-blocking execution, depending on the nature of the tasks in an application. diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Concurrent Hashmap.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Concurrent Hashmap.md new file mode 100644 index 0000000..182c92f --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Concurrent Hashmap.md @@ -0,0 +1,82 @@ +# Concurrent Hashmap + +Java Collections provides various data structures for working with key-value pairs. The commonly used ones are - +- **Hashmap** (Non-Synchronised, Not Thread Safe) + - discuss the Synchronized Hashmap method + +- **Hashtable** (Synchronised, Thread Safe) + - locking over entire table + +- **Concurrent Hashmap** (Synchronised, Thread Safe, Higher Level of Concurrency, Faster) + - locking at bucket level, fine grained locking + +**Hashmap and Synchronised Hashmap Method** +Synchronization is the process of establishing coordination and ensuring proper communication between two or more activities. Since a HashMap is not synchronized which may cause data inconsistency, therefore, we need to synchronize it. The in-built method ‘Collections.synchronizedMap()’ is a more convenient way of performing this task. + +A synchronized map is a map that can be safely accessed by multiple threads without causing concurrency issues. On the other hand, a Hash Map is not synchronized which means when we implement it in a multi-threading environment, multiple threads can access and modify it at the same time without any coordination. This can lead to data inconsistency and unexpected behavior of elements. It may also affect the results of an operation. + +Therefore, we need to synchronize the access to the elements of Hash Map using ‘synchronizedMap()’. This method creates a wrapper around the original HashMap and locks it whenever a thread tries to access or modify it. + +```java +Collections.synchronizedMap(instanceOfHashMap); +``` + +The `synchronizedMap()` is a static method of the Collections class that takes an instance of HashMap collection as a parameter and returns a synchronized Map from it. However,it is important to note that only the map itself is synchronized, not its views such as keyset and entrySet. Therefore, if we want to iterate over the synchronized map, we need to use a synchronized block or a lock to ensure exclusive access. + +```java +import java.util.*; +public class Maps { + public static void main(String[] args) { + HashMap cart = new HashMap<>(); + // Adding elements in the cart map + cart.put("Butter", 5); + cart.put("Milk", 10); + cart.put("Rice", 20); + cart.put("Bread", 2); + cart.put("Peanut", 2); + // printing synchronized map from HashMap + Map mapSynched = Collections.synchronizedMap(cart); + System.out.println("Synchronized Map from HashMap: " + mapSynched); + } +} +``` + +**Hashtable vs Concurrent Hashmap** +HashMap is generally suitable for single threaded applications and is faster than Hashtable, however in multithreading environments we have you use **Hashtable** or **Concurrent Hashmap**. So let us talk about them. + +While both Hashtable and Concurrent Hashmap collections offer the advantage of thread safety, their underlying architectures and capabilities significantly differ. Whether we’re building a legacy system or working on modern, microservices-based cloud applications, understanding these nuances is critical for making the right choice. + +Let's see the differences between Hashtable and ConcurrentHashMap, delving into their performance metrics, synchronization features, and various other aspects to help us make an informed decision. + +**1. Hashtable** +Hashtable is one of the oldest collection classes in Java and has been present since JDK 1.0. It provides key-value storage and retrieval APIs: + +```java +Hashtable hashtable = new Hashtable<>(); +hashtable.put("Key1", "1"); +hashtable.put("Key2", "2"); +hashtable.putIfAbsent("Key3", "3"); +String value = hashtable.get("Key2"); +``` +**The primary selling point of Hashtable is thread safety, which is achieved through method-level synchronization**. + +Methods like put(), putIfAbsent(), get(), and remove() are synchronized. Only one thread can execute any of these methods at a given time on a Hashtable instance, ensuring data consistency. + +**2. Concurrent Hashmap** +ConcurrentHashMap is a more modern alternative, introduced with the Java Collections Framework as part of Java 5. + +Both Hashtable and ConcurrentHashMap implement the Map interface, which accounts for the similarity in method signatures: +```java +ConcurrentHashMap concurrentHashMap = new ConcurrentHashMap<>(); +concurrentHashMap.put("Key1", "1"); +concurrentHashMap.put("Key2", "2"); +concurrentHashMap.putIfAbsent("Key3", "3"); +String value = concurrentHashMap.get("Key2"); +``` + +ConcurrentHashMap, on the other hand, provides thread safety with a higher level of concurrency. It allows multiple threads to read and perform limited writes simultaneously **without locking the entire data structure**. This is especially useful in applications that have more read operations than write operations. + +**Performance Comparison** +Hashtable locks the entire table during a write operation, thereby preventing other reads or writes. This could be a bottleneck in a high-concurrency environment. + +ConcurrentHashMap, however, allows concurrent reads and limited concurrent writes, making it more scalable and often faster in practice. diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Functional programming in Java.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Functional programming in Java.md new file mode 100644 index 0000000..43273e6 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Functional programming in Java.md @@ -0,0 +1,136 @@ +# Functional Programming in Java +--- +Functional Programming (FP) is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data. In Java, functional programming features were introduced in Java 8 with the addition of lambda expressions, the `java.util.function` package, and the Stream API. Here are the key concepts of Functional Programming in Java: +- Lambda Expressions +- Functional Interfaces +- Stream API +- Immutabilitity +- Higher Order Functions +- Parallelism + +### 1. Lambda Expressions: +Lambda expressions are a concise way to represent anonymous functions. They provide a clear and concise syntax for writing functional interfaces (interfaces with a single abstract method). Lambda expressions are the cornerstone of functional programming in Java. + +```java +// Traditional anonymous class +Runnable runnable1 = new Runnable() { + @Override + public void run() { + System.out.println("Hello, world!"); + } +}; + +// Lambda expression +Runnable runnable2 = () -> System.out.println("Hello, world!"); +``` + +### 2. Functional Interfaces: +Functional interfaces are interfaces with a single abstract method, often referred to as functional methods. They can have multiple default or static methods, but they must have only one abstract method. +```java +@FunctionalInterface +interface MyFunctionalInterface { + void myMethod(); +} +``` +Lambda expressions can be used to instantiate functional interfaces: +```java +MyFunctionalInterface myFunc = () -> System.out.println("My method implementation"); +``` + +In Java, the `java.util.function` package provides several functional interfaces that represent different types of functions. These functional interfaces are part of the functional programming support introduced in Java 8 and are commonly used with lambda expressions. Here's an explanation of some commonly used functional interfaces in Java: + +##### Function +Represents a function that takes one argument of type T and produces a result of type R. +The method `apply(T t)` is used to apply the function. +```java +Function stringLengthFunction = s -> s.length(); +int length = stringLengthFunction.apply("Java"); +``` + +##### Consumer +Represents an operation that accepts a single input argument of type T and returns no result. The method `accept(T t)` is used to perform the operation. +```java +Consumer printUpperCase = s -> System.out.println(s.toUpperCase()); +printUpperCase.accept("Java"); +``` + +##### BiFunction +Represents a function that takes two arguments of types T and U and produces a result of type R. The method `apply(T t, U u)` is used to apply the function. +```java +BiFunction sumFunction = (a, b) -> a + b; +int sum = sumFunction.apply(3, 5); +``` +##### Predicate +Represents a predicate (boolean-valued function) that takes one argument of type T. +The method `test(T t)` is used to test the predicate +```java +Predicate isEven = n -> n % 2 == 0; +boolean result = isEven.test(4); // true +``` +##### Supplier +Represents a supplier of results. +The method get() is used to get the result. +```java +Supplier randomNumberSupplier = () -> Math.random(); +double randomValue = randomNumberSupplier.get(); +``` + +These functional interfaces facilitate the use of lambda expressions and support the functional programming paradigm in Java. They can be used in various contexts, such as with the Stream API, to represent transformations, filters, and other operations on collections of data. The introduction of these functional interfaces in Java 8 enhances code readability and expressiveness. + +### 3. Streams +Streams provide a functional approach to processing sequences of elements. They allow you to express complex data manipulations using a pipeline of operations, such as map, filter, and reduce. Streams are part of the `java.util.stream` package. + +```java +List strings = Arrays.asList("abc", "def", "ghi", "jkl"); + +// Filter strings starting with 'a' and concatenate them +String result = strings.stream() + .filter(s -> s.startsWith("a")) + .map(String::toUpperCase) + .collect(Collectors.joining(", ")); + +System.out.println(result); // Output: ABC +``` +### 4. Immutablility +Functional programming encourages immutability, where objects once created cannot be changed. In Java, you can use the final keyword to create immutable variables. + +The immutability is a big thing in a multithreaded application. It allows a thread to act on an immutable object without worrying about the other threads because it knows that no one is modifying the object. So the immutable objects are more thread safe than the mutable objects. If you are into concurrent programming, you know that the immutability makes your life simple. + +### 5. Higher-Order Functions: +Functional programming supports higher-order functions, which are functions that can take other functions as parameters or return functions as results. Higher-order functions are a key concept in functional programming, enabling a more expressive and modular coding style. Java, starting from version 8, introduced support for higher-order functions with the introduction of lambda expressions and the `java.util.function` package. + + +```java +// Function that takes a function as a parameter +public static void processNumbers(List numbers, Function processor) { + for (int i = 0; i < numbers.size(); i++) { + numbers.set(i, processor.apply(numbers.get(i))); + } +} + +// Usage of higher-order function +List numbers = Arrays.asList(1, 2, 3, 4, 5); +processNumbers(numbers, x -> x * 2); +System.out.println(numbers); // Output: [2, 4, 6, 8, 10] +``` + +### 7. Parallelism: +Functional programming encourages writing code that can easily be parallelized. The Stream API provides methods for parallel execution of operations on streams. +```java +List numbers = Arrays.asList(1, 2, 3, 4, 5); + +// Parallel stream processing +int sum = numbers.parallelStream() + .mapToInt(Integer::intValue) + .sum(); +System.out.println(sum); // Output: 15 +``` +--- +## Benefits of Functional Programming in Java +- **Conciseness**: Lambda expressions make code more concise and readable. +- **Parallelism**: Easier to parallelize code due to immutability and statelessness. +- **Predictability**: Immutability reduces side effects and makes code more predictable. +- **Testability**: Functions with no side effects are easier to test. +- **Modularity**: Encourages modular and reusable code. + +Functional programming in Java complements the existing object-oriented programming paradigm and provides developers with powerful tools to write more expressive, modular, and maintainable code. It promotes the use of pure functions, immutability, and higher-order functions, leading to code that is often more concise and easier to reason about. diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Garbage Collection in Java (1).md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Garbage Collection in Java (1).md new file mode 100644 index 0000000..2a174eb --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Garbage Collection in Java (1).md @@ -0,0 +1,276 @@ + +# Chapter - Garbage Collection in Java +----- + + +### Introduction + +One of the reasons that make Java as a robust programming language is its memory management. Memory management can be a difficult, tedious task in traditional programming environments. For example, in C/C++, the programmer will often manually allocate and free dynamic memory. This sometimes leads to problems, because programmers will either forget to free memory that has been previously allocated or, worse, try to free some memory that another part of their code is still using. Java virtually eliminates these problems by managing memory allocation and deallocation for you. In fact, deallocation is completely automatic, because Java provides garbage collection for unused objects. In this tutorial, we will study the following topics. + +**Part-I** +- Java Memory Model (Stack and Heap) +- Need for Garbage Collection (Operations, Benefits, Disadvantanges) +- Benefits and Disadvantages of GC + +**Part-II** +- Memory Allocation, Defragmentation and Garbage Collection +- Conditions for Garbage Collector to run +- Garbage Collection for Java Objects +- Handling unmanaged resources + +**Part-III** +- Choosing a Garbage Collection Algorithm +- Understanding Mark and Sweep +- Garbage Collectors in Java 17 + +### Java Memory Model - Stack vs Heap +Applications need memory to run, because they need to create objects in the memory and perform computational tasks. These can be created on Stack and Heap Memory. Lets us quickly discuss the features of stack and heap memory. + +Local primitive variables and reference variables to objects data types are created on stack memory and cleared automatically when the stack frame is popped after the function call gets over. Hence, everything associated when stack memory gets cleared off automatically following the LIFO order in the call stack. There is no garbage collection involved in stack memory. Because of simplicity in memory allocation (LIFO), stack memory is very fast when compared to heap memory. + +```java + func(){ + int a = 10; //here 'a' is created on stack + int arr[] = new int[10]; + // here arr reference is created on stack but actual allocation is on heap + ... + } +``` +However, you need heap memory when you need to allocate any kind of objects like arrays, user defined objects, dynamic data structures such as arraylist, strings, trees etc + +Whenever an object is created, it’s always stored in the Heap space and stack memory contains the reference to it. Objects stored in the heap are globally accessible whereas stack memory can’t be accessed by other threads. Creating objects on heap also allows passing large objects by reference across different functions, thus avoiding the need to create a copy of the object. For such objects on heap de-allocation is required for unused objects, which can be performed explicitly by invoking `delete` in langages like C++. But language like Java, Python provide support for automatic garbage collection. + + When stack memory is full, Java runtime throws `java.lang.StackOverFlowError` whereas if heap memory is full, it throws `java.lang.OutOfMemoryError: Java Heap Space error`. Stack memory size is very less when compared to Heap memory. We can use `-Xms` and `-Xmx` JVM option to define the startup size and maximum size of heap memory. We can use `-Xss` to define the stack memory size. + + +### Need for Garbage Collection + +The garbage collector manages the allocation and release of memory for an application. Therefore, developers working with managed code don't have to write code to perform memory management tasks. Automatic memory management can eliminate common problems such as forgetting to free an object and causing a memory leak or attempting to access freed memory for an object that's already been freed. + +**Operations performed by a Garbage Collector** +- Allocates from and gives back memory to the operating system. +- Hands out that memory to the application as it requests it. +- Determines which parts of that memory is still in use by the application. +- Reclaims the unused memory for reuse by the application. +- Running memory defragmentation. + +**Benefits of Garbage Collector** +- Frees developers from having to manually release memory. +- Allocates objects on the managed heap efficiently. +- Reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations. +- Provides memory safety by making sure that an object can't use for itself the memory allocated for another object. +- No overhead of handling Dangling Pointer + +**Disadvantages of Garbage Collector** +- Java garbage collection helps your Java environments and applications perform more efficiently. However, you can still potentially run into issues with automatic garbage collection, including degraded application performance. +- Since JVM has to keep track of object reference creation/deletion, this activity requires more CPU power than the original application. It may affect the performance of requests which require large memory. +- Programmers have no control over the scheduling of CPU time dedicated to freeing objects that are no longer needed. +- Using some GC implementations might result in the application stopping unpredictably. + + +While you can’t manually override automatic garbage collection, there are things you can do to optimize garbage collection in your application environment, such as changing the garbage collector you use, removing all references to unused Java objects, tuning the parameters of Garbage collector etc. + +**Memory Allocation, Defragmentation & Garbage Collection** +We’ve seen how heap memory can provide a flexible way of allocation chunks of memory on-the-go. The chunks aren’t planned ahead of time; it’s a real-time thing: when the program, for whatever reason, needs more memory, then the operating system finds an available chunk and allocates that chunk to the program.The program can use it until it’s done with that chunk, at which time it releases the chunk for later use by the same or a different program. +![](https://www.insidetheiot.com/wp-content/uploads/2019/12/Memory-hole.png) + +After some time, the memory might look like this. +![](https://www.insidetheiot.com/wp-content/uploads/2019/12/Full-memory.png) + +Now what? There’s enough free memory for the new allocation, but the problem is that it’s all broken up all over the place. Said another way, it’s fragmented. We really don’t want to break up the allocation and spread it over multiple holes. That would use more memory (for managing where all the pieces are), and it would slow things down. + +So we’re left with the problem: how do we allocate that new chunk? We first need to reorganize the memory and move things around to get all of those holes together into a larger chunk of available memory. That means closing up the holes and “pushing” the holes to the end of the memory where they can be reused. + +That process of moving things around to bring the free memory chunks together is called **defragmentation**.The process of defragmenting memory by moving multiple free “holes” in memory together so that they can be allocated more effectively.And yeah, it takes some time to do. It’s also hard to predict when it will be needed, since it all depends on who needs memory and releases memory at what time. The process is fast enough to where you may not notice it, but it can make a difference. + +![](https://www.insidetheiot.com/wp-content/uploads/2019/12/Final-allocation.png) + +**Pseudocode for New()** +```java +def new(): + obj = allocate() //request for memory + if obj == NULL: + GC.collect() //trigger garbage collector + obj = allocate() //re-try to allocate memory + if obj == NULL: //no garbage was collected or not sufficient memory + raise OutOfMemoryError + + return obj +``` + +**Important Note** +Garbage collection only occurs sporadically (if at all) during the execution of your program. It will not occur simply because one or more objects exist that are no longer used. Furthermore, different Java run-time implementations will take varying approaches to garbage collection, but for the most developers, you should not have to think about it while writing your programs. The classes in the **java.lang.ref** package provide more flexible control over the garbage collection process. + +There are various ways in which the references to an object can be released to make it a candidate for Garbage Collection. Some of them are: + +**By making a reference null** +```java +Student student = new Student(); +student = null; +``` + +**By assigning a reference to another** +```java +Student studentOne = new Student(); +Student studentTwo = new Student(); +studentOne = studentTwo; +``` +**Conditions for a Garbage Collector to run** +Garbage collection occurs when one of the following conditions is true: + +- The system has low physical memory. The memory size is detected by either the low memory notification from the operating system or low memory as indicated by the host. + +- The memory that's used by allocated objects on the managed heap surpasses an acceptable threshold. This threshold is continuously adjusted as the process runs. + +- The GC.Collect() method is called. In almost all cases, you don't have to call this method because the garbage collector runs continuously. This method is primarily used for unique situations and testing. + +**Handling unmanaged resources and finalize() Method** +For most of the objects your application creates, you can rely on garbage collection to perform the necessary memory management tasks automatically. However, unmanaged resources require explicit cleanup. The most common type of unmanaged resource is an object that wraps an operating system resource, such as a file handle, window handle, or network connection. Although the garbage collector can track the lifetime of a managed object that encapsulates an unmanaged resource, it doesn't have specific knowledge about how to clean up the resource. finalize() method in Java is a method of the Object class that is used to perform cleanup activity before destroying any object. It is called by Garbage collector before destroying the objects from memory. You can either use a safe handle to wrap the unmanaged resource, or override the Object.Finalize() method. `finalize()` method is called by default for every object before its deletion. This method helps Garbage Collector to close all the resources used by the object and helps JVM in-memory optimization. + +----- +### Choice of a Garbage Collector Algorithm + +Any garbage collection algorithm must perform 2 basic operations. One, it should be able to detect all the unreachable objects and secondly, it must reclaim the heap space used by the garbage objects and make the space available again to the program. + +When does the choice of a garbage collector matter? For some applications, the answer is never. That is, the application can perform well in the presence of garbage collection with pauses of modest frequency and duration. However, this isn't the case for a large class of applications, particularly those with large amounts of data (multiple gigabytes), many threads, and high transaction rates. Garbage collectors make assumptions about the way applications use objects, and these are reflected in tunable parameters that can be adjusted for improved performance. + + +Here are few desirable properties of a Garbage Collector. + +##### 1. Safety +A garbage collector is safe when it never reclaims the space of a LIVE object and always cleans up only the dead objects. +Although this looks like an obvious requirement, some GC algorithms claim space of LIVE objects just to gain that extra ounce of performance. + +##### 2.Throughput +A garbage collector should be as little time cleaning up the garbage as possible; this way it would ensure that the CPU is spent on doing actual work and not just cleaning up the mess. +Most garbage collectors hence run small cycles frequently and a major cycle does deep cleaning once a while. This way they maximize the overall throughput and ensure we spend more time doing actual work. + + +##### 3.Completeness +A garbage collector is said to be complete when it eventually reclaims all the garbage from the heap. +It is not desirable to do a complete clean-up every time the GC is executed, but eventually, a GC should guarantee that the garbage is cleaned up ensuring zero memory leaks. + +##### 4.Pause Time +Some garbage collectors pause the program execution during the cleanup and this induces a "pause". Long pauses affect the throughput of the system and may lead to unpredictable outcomes; so a GC is designed and tuned to minimize the pause time. +The garbage collector needs to pause the execution because it needs to either run defragmentation where the heap objects are shuffled freeing up larger contiguous memory segments. + + +##### 5.Space overhead +Garbage collectors require auxiliary data structures to track objects efficiently and the memory required to do so is pure overhead. An efficient GC should have this space overhead as low as possible allowing sufficient memory for the program execution. + + +##### 6.Language Specific Optimizations +Most GC algorithms are generic but when bundled with the programing language the GC can exploit the language patterns and object allocation nuances. So, it is important to pick the GC that can leverage these details and make its execution as efficient as possible. +For example, in some programming languages, GC runs in constant time by exploiting how objects are allocated on the heap. + +##### 7.Scalability +Most GC are efficient in cleaning up a small chunk of memory, but a scalable GC would run efficiently even on a server with large RAM. Similarly, a GC should be able to leverage multiple CPU cores, if available, to speed up the execution. + + +Amdahl's law (parallel speedup in a given problem is limited by the sequential portion of the problem) implies that most workloads can't be perfectly parallelized; some portion is always sequential and doesn't benefit from parallelism. In the Java platform, there are currently four supported garbage collection alternatives and all but one of them, the serial GC, parallelize the work to improve performance. It's very important to keep the overhead of doing garbage collection as low as possible. + +# Garbage Collection Algorithm + +A theoretical, most straightforward garbage collection algorithm iterates over every reachable object every time it runs. Any leftover objects are considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data. + + +## Mark-and-sweep Algorithm +Over the lifetime of a Java application, new objects are created and released. Eventually, some objects are no longer needed. You can say that at any point in time, the heap memory consists of two types of objects: + +- Live - these objects are being used and referenced from somewhere else + +- Dead - these objects are no longer used or referenced from anywhere and can be deleted. + + +The Java garbage collection process uses a mark-and-sweep algorithm. Here’s how that works +There are two phases in this algorithm: *mark followed by sweep*. + +- During the mark phase, the garbage collector traverses object trees starting at their roots. When an object is reachable from the root, the mark bit is set to 1 (true). Meanwhile, the mark bits for unreachable objects is unchanged (false). + + +![](https://www.freecodecamp.org/news/content/images/2021/01/image-76.png) + + +- During the sweep phase, the garbage collector traverses the heap, reclaiming memory from all items with a mark bit of 0 (false). + +**What are Garbage Collection Roots?** +Garbage collectors work on the concept of Garbage Collection Roots (GC Roots) to identify live and dead objects. The garbage collector traverses the whole object graph in memory, starting from those Garbage Collection Roots and following references from the roots to other objects. + +*Object graph* is basically a dependency graph between objects.In this graph, the nodes are Java objects, and the edges are the explicit or implied references that allow a running program to "reach" other objects from a given one. It is used to determine which objects are reachable and which not, so that all unreachable objects could be made eligible for garbage collection. + +---- +#### Garbage Collectors in Java 17 +Java 17 supports several types of garbage collectors, including the Serial GC, Parallel GC, Concurrent Mark Sweep (CMS) GC, G1 GC, and the newly-introduced Z Garbage Collector (ZGC) and Shenandoah GC. Each of these garbage collectors has unique characteristics and can be chosen based on the requirements of your Java application. The Java garbage collectors employ various techniques to improve the efficiency of these operations: + +- Java Garbage Collectors implement a generational garbage collection strategy that categorizes objects by age. Having to mark and compact all the objects in a JVM is inefficient. As more and more objects are allocated, the list of objects grows, leading to longer garbage collection times. +![](https://www.freecodecamp.org/news/content/images/size/w2400/2021/01/image-70.png) + +- Use multiple threads to aggressively make operations parallel, or perform some long-running operations in the background concurrent to the application. + +- Try to recover larger contiguous free memory by compacting live objects. + + +**1. Serial Garbage Collector** + +The Serial GC, also known as the ‘single-threaded’ GC, is the simplest form of garbage collection in Java. It uses just one CPU thread for garbage collection, which means it can be efficient for applications with a small heap size (up to approximately 100MB). However, during the garbage collection process, user threads are paused, which can lead to latency issues in larger applications. All garbage collection events are conducted serially in one thread. Compaction is executed after each garbage collection. + +![](https://www.freecodecamp.org/news/content/images/size/w1600/2021/01/image-68.png) + + +Compacting describes the act of moving objects in a way that there are no holes between objects. After a garbage collection sweep, there may be holes left between live objects. Compacting moves objects so that there are no remaining holes. + To enable Serial Garbage Collector, we can use the following argument: + +```java -XX:+UseSerialGC -jar Application.java``` + +**2. Parallel Garbage Collector** +Unlike Serial Garbage Collector, it uses multiple threads for managing heap space, but it also freezes other application threads while performing GC. The parallel collector is intended for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded hardware. This is the default implementation of GC in the JVM and is also known as Throughput Collector. Running the Parallel GC also causes a "stop the world event" and the application freezes. Since it is more suitable in a multi-threaded environment, it can be used when a lot of work needs to be done and long pauses are acceptable, for example running a batch job. + +Multiple threads are used for minor garbage collection in the Young Generation. A single thread is used for major garbage collection in the Old Generation. +If we use this GC, we can specify maximum garbage collection threads and pause time, throughput, and footprint (heap size) using command line arguments. + +```java -XX:+UseParallelGC -jar Application.java``` + +![](https://www.freecodecamp.org/news/content/images/size/w1600/2021/01/image-66.png) + +**3. Concurrent Mark and Sweep** +This is also known as the concurrent low pause collector. Multiple threads are used for minor garbage collection using the same algorithm as Parallel. Major garbage collection is multi-threaded, like Parallel Old GC, but CMS runs concurrently alongside application processes to minimize “stop the world” events. Because of this, the CMS collector uses more CPU than other GCs. If you can allocate more CPU for better performance, then the CMS garbage collector is a better choice than the parallel collector. No compaction is performed in CMS GC. + +![](https://www.freecodecamp.org/news/content/images/size/w1600/2021/01/image-67.png) + + +The JVM argument to use Concurrent Mark Sweep Garbage Collector is ```java -XX:+UseConcMarkSweepGC``` + +**4. G1 Garbage Collector** +G1 (Garbage First) Garbage Collector is designed for applications running on multi-processor machines with large memory space. It’s available from the JDK7 Update 4 and in later releases. + + When performing garbage collections, G1 shows a concurrent global marking phase (i.e. phase 1, known as Marking) to determine the liveness of objects throughout the heap. + +After the mark phase is complete, G1 knows which regions are mostly empty. It collects in these areas first, which usually yields a significant amount of free space (i.e. phase 2, known as Sweeping). + +```java -XX:+UseG1GC -jar Application.java``` + + +**5. Z Garbage Collector** +The Z Garbage Collector (ZGC) is a scalable low latency garbage collector. ZGC performs all expensive work concurrently, without stopping the execution of application threads for more than 10ms, which makes is suitable for applications which require low latency and/or use a very large heap (multi-terabytes). +The Z Garbage Collector is available as an experimental feature, and is enabled with the command-line options +```java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC``` + +#### Conclusion +Remember, there’s no one-size-fits-all when it comes to choosing a garbage collector. A GC that works great for one application might not be the best choice for another. As with most aspects of system tuning, the best strategy often involves a mix of knowledge, experimentation, and a thorough understanding of your specific use case. + +If your application doesn't have strict pause-time requirements, you should just run your application and allow the JVM to select the right collector. + +Most of the time, the default settings should work just fine. If necessary, you can adjust the heap size to improve performance. If the performance still doesn't meet your goals, you can modify the collector as per your application requirements: + +**Serial** - If the application has a small data set (up to approximately 100 MB) and/or it will be run on a single processor with no pause-time requirements + +**Parallel** - If peak application performance is the priority and there are no pause-time requirements or pauses of one second or longer are acceptable + +**CMS/G1** - If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately one second + +**ZGC** - If response time is a high priority, and/or you are using a very large heap + + + + + diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java - Hashmap, TreeMap, Linked Hashmap.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java - Hashmap, TreeMap, Linked Hashmap.md new file mode 100644 index 0000000..3612371 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java - Hashmap, TreeMap, Linked Hashmap.md @@ -0,0 +1,248 @@ +## Java Collections - Hashmap, Linked Hashmap & Tree Map + +- Hashmap +- Linked Hashmap +- TreeMap + + +A **hash map** is good as a general-purpose map implementation that provides rapid storage and retrieval operations. However, it falls short because of its chaotic and unorderly arrangement of entries. + +A **linked hash map** possesses the good attributes of hash maps and adds order to the entries. It performs better where there is a lot of iteration because only the number of entries is taken into account regardless of capacity. + +A **tree map** takes ordering to the next level by providing complete control over how the keys should be sorted. On the flip side, it offers worse general performance than the other two alternatives. + +### 1. Hashmap +Let’s first look at what it means that HashMap is a map. A map is a key-value mapping, which means that every key is mapped to exactly one value and that we can use the key to retrieve the corresponding value from a map. + +The advantage of a HashMap is that the time complexity to insert and retrieve a value is O(1) on average. We have covered the internal workings in the video lectures already. +Before we proceed Let’s summarize how the put and get operations work. + +**Put()** +When we add an element to the map, HashMap calculates the bucket. If the bucket already contains a value, the value is added to the list (or tree) belonging to that bucket. If the load factor becomes bigger than the maximum load factor of the map, the capacity is doubled. + +**Get()** +When we want to get a value from the map, HashMap calculates the bucket and gets the value with the same key from the list (or tree). + +**Example Code** +Lets try to create a hashmap of products. We will create a Product class first. +```java +public class Product { + + private String name; + private String description; + private List tags; + + // standard getters/setters/constructors + + public Product addTagsOfOtherProduct(Product product) { + this.tags.addAll(product.getTags()); + return this; + } +} +``` +We can now create a HashMap with the key of type String and elements of type Product: + +```java +Map productsByName = new HashMap<>(); +``` +**1. Put Method** +Adding to hashmap. +```java +Product eBike = new Product("E-Bike", "A bike with a battery"); +Product roadBike = new Product("Road bike", "A bike for competition"); +//using the Put Method +productsByName.put(eBike.getName(), eBike); +productsByName.put(roadBike.getName(), roadBike); +``` + +**2. Get Method** +We can retrieve a value from the map by its key: +```java +Product nextPurchase = productsByName.get("E-Bike"); +assertEquals("A bike with a battery", nextPurchase.getDescription()); +``` + +If we try to find a value for a key that doesn’t exist in the map, we’ll get a null value: + +**3. Remove** + +We can remove a key-value mapping from the HashMap: +```java +productsByName.remove("E-Bike"); +assertNull(productsByName.get("E-Bike")); +``` + +**4. Contains Key** +To check if a key is present in the map, we can use the containsKey() method: +```java +productsByName.containsKey("E-Bike"); +``` + +----- +**Hashmap with Custom Key Class** +We can use any class as the key in our HashMap. However, for the map to work properly, we need to provide an implementation for equals() and hashCode(). +In most cases, we should use **immutable keys**. Or at least, we must be aware of the consequences of using mutable keys. If key changes after insertion, HashMap will be searching in the wrong bucket and leading to inconsistent behaviour. + + +Let’s say we want to have a map with the product as the key and the price as the value: + +```java +HashMap priceByProduct = new HashMap<>(); +priceByProduct.put(eBike, 900); +``` +Let’s implement the equals() and hashCode() methods: + + +```java +//Override these methods in the Product Class +@Override +public boolean equals(Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + + Product product = (Product) o; + return Objects.equals(name, product.name) && + Objects.equals(description, product.description); +} + +@Override +public int hashCode() { + return Objects.hash(name, description); +} +``` +Note that `hashCode()` and `equals()` need to be overridden only for classes that we want to use as map keys, not for classes that are only used as values in a map. + +----- + +### 2. Linked Hashmap +The LinkedHashMap class is very similar to HashMap in most aspects. However, the linked hash map is based on both hash table and linked list to enhance the functionality of hash map. + +It maintains a doubly-linked list running through all its entries in addition to an underlying array of default size 16. + +This linked list defines the order of iteration, which by default is the order of insertion of elements (insertion-order). + +Let’s have a look at a linked hash map instance which orders its entries according to how they’re inserted into the map. It also guarantees that this order will be maintained throughout the life cycle of the map: + +```java +public void givenLinkedHashMap_whenGetsOrderedKeyset_thenCorrect() { + LinkedHashMap map = new LinkedHashMap<>(); + map.put(1, null); + map.put(2, null); + map.put(3, null); + map.put(4, null); + map.put(5, null); + + Set keys = map.keySet(); + Integer[] arr = keys.toArray(new Integer[0]); + + for (int i = 0; i < arr.length; i++) { + assertEquals(new Integer(i + 1), arr[i]); + } +} +``` +We can guarantee that this test will always pass as the insertion order will always be maintained. We cannot make the same guarantee for a HashMap. + +**Access Order Linked Hashmap** +LinkedHashMap provides a special constructor which enables us to specify, among custom load factor (LF) and initial capacity, a different ordering mechanism/strategy called access-order: +```java +LinkedHashMap map = new LinkedHashMap<>(16, .75f, true); +``` +The first parameter is the initial capacity, followed by the load factor and the last param is the ordering mode. So, by passing in true, we turned on access-order, whereas the default was insertion-order. + +This mechanism ensures that the order of iteration of elements is the order in which the elements were last accessed, from least-recently accessed to most-recently accessed. + +**LRU using LinkedHashmap** +And so, building a Least Recently Used (LRU) cache is quite easy and practical with this kind of map. A successful put or get operation results in an access for the entry: +```java +public void givenLinkedHashMap_whenAccessOrderWorks_thenCorrect() { + LinkedHashMap map + = new LinkedHashMap<>(16, .75f, true); + map.put(1, null); + map.put(2, null); + map.put(3, null); + map.put(4, null); + map.put(5, null); + + Set keys = map.keySet(); + assertEquals("[1, 2, 3, 4, 5]", keys.toString()); + + map.get(4); + assertEquals("[1, 2, 3, 5, 4]", keys.toString()); + + map.get(1); + assertEquals("[2, 3, 5, 4, 1]", keys.toString()); + + map.get(3); + assertEquals("[2, 5, 4, 1, 3]", keys.toString()); +} + +``` +Just like HashMap, LinkedHashMap implementation is not synchronized. So if you are going to access it from multiple threads and at least one of these threads is likely to change it structurally, then it must be externally synchronized. +```java +Map m = Collections.synchronizedMap(new LinkedHashMap()); +``` +We will learn more about concurrency in a separate tutorial. + +----- +### 3. TreeMap ### +TreeMap is a map implementation that keeps its entries sorted according to the natural ordering of its keys or better still using a comparator if provided by the user at construction time. + +By default, TreeMap sorts all its entries according to their natural ordering. For an integer, this would mean ascending order and for strings, alphabetical order.A hash map does not guarantee the order of keys stored and specifically does not guarantee that this order will remain the same over time, but a tree map guarantees that the keys will always be sorted according to the specified order. + + +TreeMap, unlike a hash map and linked hash map, does not employ the hashing principle anywhere since it does not use an array to store its entries but uses a self-balanancing tree such as **Red Black Tree** data structure to store the entries. +A red-black tree is a self-balancing binary search tree. This attribute and the above guarantee that basic operations like search, get, put and remove take logarithmic time O(log n) as For every insertion and deletion, the maximum height of the tree on any edge is maintained at O(log n) i.e. the tree balances itself continuously. + +Just like hash map and linked hash map, a tree map is not synchronized and therefore the rules for using it in a multi-threaded environment are similar to those in the other two map implementations. + + +A Tree Map example with Comparator +```java +public void givenTreeMap_whenOrdersEntriesByComparator_thenCorrect() { + TreeMap map = + new TreeMap<>(Comparator.reverseOrder()); + map.put(3, "val"); + map.put(2, "val"); + map.put(1, "val"); + map.put(5, "val"); + map.put(4, "val"); + + assertEquals("[5, 4, 3, 2, 1]", map.keySet().toString()); +} +``` +Notice that we placed the integer keys in a non-orderly manner but on retrieving the key set, we confirm that they are indeed maintained in ascending order. This is the natural ordering of integers. + +We now know that TreeMap stores all its entries in sorted order. Because of this attribute of tree maps, we can perform queries like; find “largest”, find “smallest”, find all keys less than or greater than a certain value, etc. +```java +public void givenTreeMap_whenPerformsQueries_thenCorrect() { + TreeMap map = new TreeMap<>(); + map.put(3, "val"); + map.put(2, "val"); + map.put(1, "val"); + map.put(5, "val"); + map.put(4, "val"); + + Integer highestKey = map.lastKey(); + Integer lowestKey = map.firstKey(); + Set keysLessThan3 = map.headMap(3).keySet(); + Set keysGreaterThanEqTo3 = map.tailMap(3).keySet(); + + assertEquals(new Integer(5), highestKey); + assertEquals(new Integer(1), lowestKey); + assertEquals("[1, 2]", keysLessThan3.toString()); + assertEquals("[3, 4, 5]", keysGreaterThanEqTo3.toString()); +} +``` + + + + + + + + + diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Architecture - JDK,JRE,JVM,JIT.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Architecture - JDK,JRE,JVM,JIT.md new file mode 100644 index 0000000..204fbc2 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Architecture - JDK,JRE,JVM,JIT.md @@ -0,0 +1,101 @@ +## Java Architecture + +Java is a platform-independent language. For that we need to understand the steps of compilation and execution of code. + +- The code written in Java, is converted into byte codes which is done by the Java Compiler +- The byte code, is converted into machine code by the JVM. +- The Machine code is executed directly by the machine. + +![](https://www.oreilly.com/api/v2/epubs/0596009208/files/httpatomoreillycomsourceoreillyimages2248099.png.jpg) + +![](https://www.oreilly.com/api/v2/epubs/0596009208/files/httpatomoreillycomsourceoreillyimages2248101.png.jpg) + + + +Bytecodes are effectively *platform-independent*. The java virtual machine takes care of the differences between the bytecodes for the different platforms. This makes the Java Compiled Code platform independent. + +![](https://cdn.programiz.com/sites/tutorial2program/files/how-java-program-runs.jpg) + +There are three main components of Java architechure: JVM, JRE, and JDK. +Java Virtual Machine, Java Runtime Environment and Java Development Kit respectively. Lets understand them one by one. + +# JVM +JVM (Java Virtual Machine) is an abstract machine(software) that enables your computer to run a Java program. + +When you run the Java program, Java compiler -`javac` first compiles your Java code to bytecode. Then, the JVM translates bytecode into native machine code (set of instructions that a computer's CPU executes directly). JVM comes with **JIT(Just-in-Time) compiler** that converts Java source code into low-level machine language. Hence, it runs more faster as a regular application. + + +# JRE +JRE (Java Runtime Environment) is a software package that provides Java class libraries, Java Virtual Machine (JVM), and other components that are required to run Java applications. JRE is the superset of JVM. + +When our software tends to execute a particular program, it requires some environment to run in. Usually, it’s any operating system for example, Unix, Linux, Microsoft Windows, or the MacOS. Here our JRE acts as a translater and also a facilitator between the java program and the operating system. + +![](https://cdn.programiz.com/sites/tutorial2program/files/java-realtime-enviornment_0.jpg) + +### JDK +JDK (Java Development Kit) is a software development kit required to develop applications in Java. When you download JDK, JRE is also downloaded with it. + +In addition to JRE, JDK also contains a number of development tools (compilers, JavaDocs, Java Debugger, etc). + +![](https://cdn.programiz.com/sites/tutorial2program/files/jdk-jre-jvm.jpg) + +---- +### JVM Deep Dive +Java applications are platform independent - write once, run anywhere. This is because of JVM which performs the following tasks - +- Loads the code +- Verifies the code +- Executes the code +- Provides runtime environment + +Here are the important components of JVM architecture: + +**1. Class Loader** +The class loader is a subsystem used for loading class files. It performs three major functions viz. Loading, Linking, and Initialization.Whenever we run the java program, class loader loads it first. + +**2. Method Area** +It is one of the Data Area in JVM, in which Class data will be stored. Static Variables, Static Blocks, Static Methods, Instance Methods are stored in this area. +JVM Method Area stores structure of class like metadata, the code for Java methods, and the constant runtime pool. + +**3. Heap** +A heap is created when the JVM starts up. It may increase or decrease in size while the application runs. All the Objects, arrays, and instance variables are stored in a heap. This memory is shared across multiple threads. + +**4. JVM language Stacks** +Java language Stacks store local variables, and its partial results. Each and every thread has its own JVM language stack, created concurrently as the thread is created. A new stack frame is created when method is invoked, and it is removed when method invocation process is complete. JVM stack is known as a thread stack. + + +**5. PC Registers** +PC registers store the address of the Java virtual machine instruction, which is currently executing. In Java, each thread has its separate PC register. + +**6. Native Method Stacks** + +Native method stacks hold the instruction of native code depends on the native library. It allocates memory on native heaps or uses any type of stack. + +**7) Execution Engine** +Execution Engine is the brain of JVM. It has two components. +- JIT compiler +- Garbage collector + +**JIT compiler**: The Just-In-Time (JIT) compiler is a part of the runtime environment. It helps in improving the performance of Java applications by compiling bytecodes to machine code at run time. The JIT compiler is enabled by default. When a method is compiled, the JVM calls the compiled code of that method directly. The JIT compiler compiles the bytecode of that method into machine code, compiling it “just in time” to run. + +**Garbage collector**: As the name explains that Garbage Collector means to collect the unused material. Well, in JVM this work is done by Garbage collection. It tracks each and every object available in the JVM heap space and removes unwanted ones. +Garbage collector works in two simple steps known as Mark and Sweep: + +Mark – it is where the garbage collector identifies which piece of memory is in use and which are not + + +Sweep – it removes objects identified during the “mark” phase. + +**8) Native Method interface** + +The Native Method Interface is a programming framework. It allows Java code, which is running in a JVM to call by libraries and native applications. + +**9) Native Method Libraries** + +Native Libraries is a collection of the Native Libraries (C, C++), which are needed by the Execution Engine. + +![](https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2019/07/JVM-768x454.png) + +---- + + + diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Static and Final - Scaler Notes.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Static and Final - Scaler Notes.md new file mode 100644 index 0000000..cf7c586 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Static and Final - Scaler Notes.md @@ -0,0 +1,263 @@ +![]() +------ +# Understanding Static and Final Keyword in Java + +## Static Keyword +Java uses static keyword at 4 different places. Lets learn about each of them. +- Static Instance Variables +- Static Methods +- Static Block +- Static Classes. + +**Static Members and Methods** +There will be times when you will want to define a class member that will be used independently of any object of that class. Normally, a class member must be accessed only in conjunction with an object of its class. However, it is possible to create a member that can be used by itself, without reference to a specific instance. + +To create such a member, precede its declaration with the keyword static. When a member is declared `static`, it can be accessed before any objects of its class are created, and without reference to any object. You can declare both methods and variables to be static. + +The most common example of a static member is main( ). main( ) is declared as static because it must be called before any objects exist. + +Instance variables declared as static are, essentially, global variables. When objects of its class are declared, no copy of a static variable is made. Instead, all instances of the class share the same static variable. + +**Math.java** +```java +public class Math { + static String author = "Prateek"; + + static int area(int l,int b){ + return l*b; + } +} +``` +In the above example the static keywod has been used to create a static data member and a static method. We don't need create a Math Class objects to acess the methods and data members, instead we can directly refer `Math.author` and `Math.area(v1,v2)` from main. +```java + public static void main(String[] args) { + //static block will execute when the class is loaded + + System.out.println("Area of Rectangle " + Math.area(10,20)); + System.out.println(Math.author); + } +``` + +**Restrictions on Static Methods**: + +• They can only directly call other static methods of their class. + +• They can only directly access static variables of their class. + +• They cannot refer to `this` or `super` in any way. (Super keyword is used in inheritance) + + +**Static Block** +If you need to do computation in order to initialize your static variables, you can declare a static block that gets executed exactly once, when the class is first loaded. +As soon as the class is loaded, all of the static statements are run. + +```java +public class StaticDemoExample { + static int a = 3; + static int b; + + static{ + System.out.println("Inside Static Block"); + b = a*4; + printData(); + } + + static void printData(){ + System.out.println(a); + System.out.println(b); + } + + public static void main(String[] args) { + //static block will execute when the class is loaded + // it is used to init static variables + } +} +``` +In the above code, static block will automatically as you launch the program, the class is loaded and static block is executed. Despite the main being empty, the code will output the following as static block is still executed. + +*Code Output* +``` +Inside Static Block +3 +12 +``` + +## Nested Classes and 'Static' Modifier +It is possible to define a class within another class; such classes are known as nested classes. There are two types of nested classes: static and non-static. Lets learn about them. + +**Static Nested Class (Static Inner Class)**: +- A static nested class is a nested class that is declared as static. +- It does not have access to the instance-specific members of the outer class.Because it is static, it must access the non-static members of its enclosing class through an object. +- You can create an instance of a static nested class without creating an instance of the outer class. +- Static nested classes are often used for grouping related utility methods or encapsulating code within a class. +- Note: In Java, only nested classes are allowed to be static. + + +```java +public class OuterClass { + // Outer class members + + static class StaticNestedClass { + // Static nested class members + } +} +``` +**Inner Class (Non-static Nested Class)** + +- An inner class is a nested class that is not declared as static. +- It can access both static and instance-specific members of the outer class. +- An instance of an inner class can only be created within an instance of the outer class. +- Inner classes are often used for implementing complex data structures or for achieving better encapsulation. + +```java +public class OuterClass { + // Outer class members + + class InnerClass { + // Inner class members + } +} +``` +**Nested Static Classes in Builder Design Pattern** +This kind of class design is particularly useful in Builder Design Pattern which you will study later as a part of Low Level Design Course. In short, The Builder Design Pattern is a creational design pattern that allows you to create complex objects step by step. It's especially useful when you have an object with many optional parameters or configurations. Here's an example of how you can implement the Builder pattern in Java: + +Suppose you want to create a Person class with optional attributes like name, age, address, and phone number using the Builder pattern: + +```java +public class Person { + private String name; + private int age; + private String address; + private String phoneNumber; + + // Private constructor to prevent direct instantiation + private Person() { + } + + // Nested Builder class + public static class Builder { + private String name; + private int age; + private String address; + private String phoneNumber; + + public Builder(String name) { + this.name = name; + } + + public Builder age(int age) { + this.age = age; + return this; + } + + public Builder address(String address) { + this.address = address; + return this; + } + + public Builder phoneNumber(String phoneNumber) { + this.phoneNumber = phoneNumber; + return this; + } + + public Person build() { + Person person = new Person(); + person.name = this.name; + person.age = this.age; + person.address = this.address; + person.phoneNumber = this.phoneNumber; + return person; + } + } + + // Getter methods for Person class + public String getName() { + return name; + } + + public int getAge() { + return age; + } + + public String getAddress() { + return address; + } + + public String getPhoneNumber() { + return phoneNumber; + } + + @Override + public String toString() { + return "Name: " + name + ", Age: " + age + ", Address: " + address + ", Phone: " + phoneNumber; + } +} +``` + +__PersonDemo.java__ +```java +public class PersonDemo { + public static void main(String[] args) { + Person person1 = new Person.Builder("John") + .age(30) + .address("123 Main St") + .phoneNumber("555-1234") + .build(); + + Person person2 = new Person.Builder("Alice") + .age(25) + .phoneNumber("555-5678") + .build(); + + System.out.println(person1); + System.out.println(person2); + } +} +``` +This allows you to create Person objects with various combinations of attributes while keeping the code clean and readable. + + + +------------------ +### Final Keyword +The keyword final has three uses. First, it can be used to create the equivalent of a named constant. The other two uses of final apply to inheritance as discussed below. + +**1. Final in Variables** +A field can be declared as final. Doing so prevents its contents from being modified, making it, essentially, a constant. This means that you must initialize a final field when it is declared. You can do this in one of two ways: First, you can give it a value when it is declared. Second, you can assign it a value within a constructor. The first approach is probably the most common. Here is an example: +``` +final int FILE_NEW = 1; +final int FILE_OPEN = 2; +final int FILE_SAVE = 3; +final int FILE_SAVEAS = 4; +final int FILE_QUIT = 5; +``` + +Subsequent parts of your program can now use FILE_OPEN, etc., as if they were constants, without fear that a value has been changed. It is a common coding convention to choose all **uppercase identifiers** for final fields, as this example shows. + +In addition to fields, both method parameters and local variables can be declared final. Declaring a parameter final prevents it from being changed within the method. Declaring a local variable final prevents it from being assigned a value more than once. + +The keyword final can also be applied to methods, but its meaning is substantially different than when it is applied to variables. + + +**2. Using final to Prevent Method Overriding** +While method overriding is one of Java’s most powerful features, there will be times when you will want to prevent it from occurring. To disallow a method from being overridden, specify final as a modifier at the start of its declaration. Methods declared as final cannot be overridden. The following fragment illustrates final. + +![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781260463422/files/ch08-0194.jpg) +Because meth( ) is declared as final, it cannot be overridden in B. If you attempt to do so, a compile-time error will result. + +Methods declared as final can sometimes provide a performance enhancement: The compiler is free to inline calls to them because it “knows” they will not be overridden by a subclass. When a small final method is called, often the Java compiler can copy the bytecode for the subroutine directly inline with the compiled code of the calling method, thus eliminating the costly overhead associated with a method call. Inlining is an option only with final methods. Normally, Java resolves calls to methods dynamically, at run time. This is called late binding. However, since final methods cannot be overridden, a call to one can be resolved at compile time. This is called early binding. + +**3. Using final to Prevent Inheritance** +Sometimes you will want to prevent a class from being inherited. To do this, precede the class declaration with final. Declaring a class as final implicitly declares all of its methods as final, too. As you might expect, it is illegal to declare a class as both abstract and final since an abstract class is incomplete by itself and relies upon its subclasses to provide complete implementations. + +Here is an example of a final class: +![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781260463422/files/ch08-0195.jpg) +As the comments imply, it is illegal for B to inherit A since A is declared as final. + + + + + + + + diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Streams, Parallel Streams and Collectors.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Streams, Parallel Streams and Collectors.md new file mode 100644 index 0000000..156f9f9 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Streams, Parallel Streams and Collectors.md @@ -0,0 +1,226 @@ +### Java Streams +A stream in Java is simply a wrapper around a data source, allowing us to perform bulk operations on the data in a convenient way. + +It doesn’t store data or make any changes to the underlying data source. Rather, it adds support for functional-style operations on data pipelines. + +In this tutorial we will learn about Sequential Streams, Parallel Streams and Collect() Method of stream. + +### Sequential Streams +By default, any stream operation in Java is processed sequentially, unless explicitly specified as parallel. + +Sequential streams use a single thread to process the pipeline: +```java +List listOfNumbers = Arrays.asList(1, 2, 3, 4); +listOfNumbers.stream().forEach(number -> + System.out.println(number + " " + Thread.currentThread().getName()) +); +``` +The output of this sequential stream is predictable. The list elements will always be printed in an ordered sequence: + +``` +1 main +2 main +3 main +4 main +``` + +### Multithreading using Parallel Streams +Stream API also simplifies multithreading by providing the `parallelStream()` method that runs operations over stream’s elements in parallel mode. Any stream in Java can easily be transformed from sequential to parallel. + +We can achieve this by adding the parallel method to a sequential stream or by creating a stream using the parallelStream method of a collection: + +The code below allows to run method doWork() in parallel for every element of the stream: +```java +list.parallelStream().forEach(element -> doWork(element)); +``` +For the above sequential example, the code will looks like this - + +```java +List listOfNumbers = Arrays.asList(1, 2, 3, 4); +listOfNumbers.parallelStream().forEach(number -> + System.out.println(number + " " + Thread.currentThread().getName()) +); +``` +Parallel streams enable us to execute code in parallel on separate cores. The final result is the combination of each individual outcome. + +However, the order of execution is out of our control. It may change every time we run the program: +``` +4 ForkJoinPool.commonPool-worker-3 +2 ForkJoinPool.commonPool-worker-5 +1 ForkJoinPool.commonPool-worker-7 +3 main +``` +Parallel streams make use of the fork-join framework and its common pool of worker threads. Parallel processing may be beneficial to fully utilize multiple cores. But we also need to consider the overhead of managing multiple threads, memory locality, splitting the source and merging the results. +Refer this [Article](https://www.baeldung.com/java-when-to-use-parallel-stream) to learn more about when to use parallel streams. + +` + +### Collect() Method + +A stream represents a sequence of elements and supports different kinds of operations that lead to the desired result. The source of a stream is usually a Collection or an Array, from which data is streamed from. + +Streams differ from collections in several ways; most notably in that the streams are not a data structure that stores elements. They're functional in nature, and it's worth noting that operations on a stream produce a result and typically return another stream, but do not modify its source. + +To "solidify" the changes, you **collect** the elements of a stream back into a Collection. + +The `stream.collect()` method is used to perform a mutable reduction operation on the elements of a stream. It returns a new mutable object containing the results of the reduction operation. + +This method can be used to perform several different types of reduction operations, such as: + +- Computing the sum of numeric values in a stream. +- Finding the minimum or maximum value in a stream. +- Constructing a new String by concatenating the contents of a stream. +- Collecting elements into a new List or Set. + +```java +public class CollectExample { + public static void main(String[] args) { + Integer[] intArray = {1, 2, 3, 4, 5}; + + // Creating a List from an array of elements + // using Arrays.asList() method + List list = Arrays.asList(intArray); + + // Demo1: Collecting all elements of the list into a new + // list using collect() method + List evenNumbersList = list.stream() + .filter(i -> i%2 == 0) + .collect(toList()); + System.out.println(evenNumbersList); + + // Demo2: finding the sum of all the values + // in the stream + Integer sum = list.stream() + .collect(summingInt(i -> i)); + System.out.println(sum); + + // Demo3: finding the maximum of all the values + // in the stream + Integer max = list.stream() + .collect(maxBy(Integer::compare)).get(); + System.out.println(max); + + // Demo4: finding the minimum of all the values + // in the stream + Integer min = list.stream() + .collect(minBy(Integer::compare)).get(); + System.out.println(min); + + // Demo5: counting the values in the stream + Long count = list.stream() + .collect(counting()); + System.out.println(count); + } +} +``` + +In Demo1: We use the stream() method to get a stream from the list. We filter the even elements and collect them into a new list using the collect() method. + +In Demo2: We use the collect() method summingInt(ToIntFunction) as an argument. The summingInt() method returns a collector that sums the integer values extracted from the stream elements by applying an int producing mapping function to each element. + +In Demo 3: We use the collect() method with maxBy(Comparator) as an argument. The maxBy() accepts a Comparator and returns a collector that extracts the maximum element from the stream according to the given Comparator. + +Lets learn more about Collectors. + + +### Collectors and Stream.Collect() + +Collectors represent implementations of the Collector interface, which implements various useful reduction operations, such as accumulating elements into collections, summarizing elements based on a specific parameter, etc. + +All predefined implementations can be found within the [Collectors](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html) class. + + +Within the Collectors class itself, we find an abundance of unique methods that deliver on the different needs of a user. One such group is made of summing methods - `summingInt()`, `summingDouble()` and `summingLong()`. + + + +Let's start off with a basic example with a List of Integers: + +```java +List numbers = Arrays.asList(1, 2, 3, 4, 5); +Integer sum = numbers.stream().collect(Collectors.summingInt(Integer::intValue)); +System.out.println("Sum: " + sum); +``` +We apply the .stream() method to create a stream of Integer instances, after which we use the previously discussed `.collect()` method to collect the elements using `summingInt()`. The method itself, again, accepts the `ToIntFunction`, which can be used to reduce instances to an integer that can be summed. + +Since we're using Integers already, we can simply pass in a method reference denoting their `intValue`, as no further reduction is needed. + +More often than not - you'll be working with lists of custom objects and would like to sum some of their fields. For instance, we can sum the quantities of each product in the productList, denoting the total inventory we have. + +Let us try to understand one of these methods using a custom class example. +``` java +public class Product { + private String name; + private Integer quantity; + private Double price; + private Long productNumber; + + // Constructor, getters and setters + ... +} +... +List products = Arrays.asList( + new Product("Milk", 37, 3.60, 12345600L), + new Product("Carton of Eggs", 50, 1.20, 12378300L), + new Product("Olive oil", 28, 37.0, 13412300L), + new Product("Peanut butter", 33, 4.19, 15121200L), + new Product("Bag of rice", 26, 1.70, 21401265L) +); + +``` + +In such a case, the we can use a method reference, such as `Product::getQuantity` as our `ToIntFunction`, to reduce the objects into a single integer each, and then sum these integers: + +```java +Integer sumOfQuantities = products.stream().collect(Collectors.summingInt(Product::getQuantity)); +System.out.println("Total number of products: " + sumOfQuantities); +``` +This results in: + +``` +Total number of products: 174 +``` + +You can also very easily implement your own collector and use it instead of the predefined ones, though - you can get pretty far with the built-in collectors, as they cover the vast majority of cases in which you might want to use them. + +The following are examples of using the predefined collectors to perform common mutable reduction tasks: +```java + + // Accumulate names into a List + List list = people.stream().map(Person::getName).collect(Collectors.toList()); + + // Accumulate names into a TreeSet + Set set = people.stream().map(Person::getName).collect(Collectors.toCollection(TreeSet::new)); + + // Convert elements to strings and concatenate them, separated by commas + String joined = things.stream() + .map(Object::toString) + .collect(Collectors.joining(", ")); + + // Compute sum of salaries of employee + int total = employees.stream() + .collect(Collectors.summingInt(Employee::getSalary))); + + // Group employees by department + Map> byDept + = employees.stream() + .collect(Collectors.groupingBy(Employee::getDepartment)); + + // Compute sum of salaries by department + Map totalByDept + = employees.stream() + .collect(Collectors.groupingBy(Employee::getDepartment, + Collectors.summingInt(Employee::getSalary))); + + // Partition students into passing and failing + Map> passingFailing = + students.stream() + .collect(Collectors.partitioningBy(s -> s.getGrade() >= PASS_THRESHOLD)); + +``` +You can look at the offical documentation for more details on these methods. +https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html + + + + diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Strings - String Pool, Immutablility, String Builder.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Strings - String Pool, Immutablility, String Builder.md new file mode 100644 index 0000000..97fce89 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Strings - String Pool, Immutablility, String Builder.md @@ -0,0 +1,173 @@ +# Java Strings - Advanced Concepts +In this tutorial we discuss 3 important concepts related to Strings. +- String Pool +- String Immutability +- String Builder + + +String are handled differently in Java. There are two ways to store strings - one as string literals stored in String Pool and as string objects stored in regular heap space. Lets discuss about them. + +### 1. Strings in String Pool +Each time you create a string literal, the JVM checks the "string constant pool" first. If the string already exists in the pool, a reference to the pooled instance is returned. If the string doesn't exist in the pool, a new string instance is created and placed in the pool. For example: + +**String Literal Syntax** +```java +String s1 = "Hello World"; +String s2 = "Hello World";//It doesn't create a new instance +``` +![](https://www.baeldung.com/wp-content/uploads/2018/08/Why_String_Is_Immutable_In_Java.jpg) + +In the above example, only one object will be created. Firstly, JVM will not find any string object with the value "Hello World" in string constant pool that is why it will create a new object. After that it will find the string with the value "Hello World" in the pool, it will not create a new object but will return the reference to the same instance. + +**Java String Pool** is the special memory region where Strings are stored by the JVM. Since Strings are immutable in Java, the JVM optimizes the amount of memory allocated for them by storing only one copy of each literal String in the pool. This process is called interning + +### 2. String Allocated Using the Constructor +When we create a String via the new operator, the Java compiler will create a new object and store it in the heap space reserved for the JVM. + +Every String created like this will point to a different memory region with its own address. + +Let’s see how this is different from the previous case: + +```java +String s1 = new String("Welcome"); +String s2 = new String("Welcome"); +//creates two objects and two reference variables point to different addresses +``` +--- +### Big Question - String Literal vs String Object? +We have just seen that when we create a String object using the `new()` operator, it always creates a new object in heap memory. On the other hand, if we create an object using String literal syntax e.g. “Hello World”, it may return an existing object from the String pool, if it already exists. Otherwise, it will create a new String object and put in the string pool for future re-use. + +At a high level, both are the String objects, but the main difference comes from the point that new() operator always creates a new String object. Also, when we create a String using literal – it is interned. + +In general, we should use the String literal notation when possible. It is easier to read and it gives the compiler a chance to optimize our code. + +---- + +## Immutablibity of Java Strings + +*Immutable* simply means unmodifiable or unchangeable. This means that once the object has been assigned to a variable, we can neither update the reference nor change the internal state by any means. + +In Java, Strings are immutable. An obvious question that is quite prevalent in interviews is “Why Strings are designed as immutable in Java?” The key benefits of keeping this class as immutable are caching, security, synchronization, and performance. + +Let’s discuss how these things work. + +#### Why String objects are immutable in Java? +As Java uses the concept of String literal. Suppose there are 5 reference variables, all refer to one object "Sachin". If one reference variable changes the value of the object, it will be affected by all the reference variables. That is why String objects are immutable in Java. + +Following are some more features of String which makes String objects immutable. + +**1. Heap Space** +The immutability of String helps to minimize the usage in the heap memory. When we try to declare a new String object, the JVM checks whether the value already exists in the String pool or not. If it exists, the same value is assigned to the new object. This feature allows Java to use the heap space efficiently. + +Java String Pool is the special memory region where Strings are stored by the JVM. Since Strings are immutable in Java, the JVM optimizes the amount of memory allocated for them by storing only one copy of each literal String in the pool. This process is called **interning** + +**2. Security** +The String is widely used in Java applications to store sensitive pieces of information like usernames, passwords, connection URLs, network connections, etc. It’s also used extensively by JVM class loaders while loading classes. + +Hence securing String class is crucial regarding the security of the whole application in general. For example, consider this simple code snippet: + +```java +void criticalMethod(String userName) { + // perform security checks + if (!isAlphaNumeric(userName)) { + throw new SecurityException(); + } + + // do some secondary tasks + initializeDatabase(); + + // critical task + connection.executeUpdate("UPDATE Customers SET Status = 'Active' " + + " WHERE UserName = '" + userName + "'"); +} +``` + +In the above code snippet, let’s say that we received a String object from an untrustworthy source. We’re doing all necessary security checks initially to check if the String is only alphanumeric, followed by some more operations. + +Remember that our unreliable source caller method still has reference to this userName object. + +If Strings were mutable, then by the time we execute the update, we can’t be sure that the String we received, even after performing security checks, would be safe. The untrustworthy caller method still has the reference and can change the String between integrity checks. Thus making our query prone to SQL injections in this case. So mutable Strings could lead to degradation of security over time. + +It could also happen that the String userName is visible to another thread, which could then change its value after the integrity check. + +**3. Synchronization** +Being immutable automatically makes the String thread safe since they won’t be changed when accessed from multiple threads. + +Hence immutable objects, in general, can be shared across multiple threads running simultaneously. They’re also thread-safe because if a thread changes the value, then instead of modifying the same, a new String would be created in the String pool. Hence, Strings are safe for multi-threading. + +**4. Hashcode Caching** +Since String objects are abundantly used as a data structure, they are also widely used in hash implementations like HashMap, HashTable, HashSet, etc. When operating upon these hash implementations, hashCode() method is called quite frequently for bucketing. + +The immutability guarantees Strings that their value won’t change. So the hashCode() method is overridden in String class to facilitate caching, such that the hash is calculated and cached during the first hashCode() call and the same value is returned ever since. + +This, in turn, improves the performance of collections that uses hash implementations when operated with String objects. + +On the other hand, mutable Strings would produce two different hashcodes at the time of insertion and retrieval if contents of String was modified after the operation, potentially losing the value object in the Map. + +----- + +# String Builder Class + +String builder is a class that represents a *mutable sequence* of characters. +Both StringBuilder and StringBuffer create objects that hold a mutable sequence of characters. Let’s see how this works, and how it compares to an immutable String class: + +```java +String immutable = "abc"; +immutable = immutable + "def"; +``` +Even though it may look like that we’re modifying the same object by appending “def”, we are creating a new one because String instances can’t be modified. + +When using either StringBuffer or StringBuilder, we can use the append() method: +```java +StringBuffer sb = new StringBuffer("abc"); +sb.append("def"); +``` +In this case, there was no new object created. We have called the append() method on sb instance and modified its content. StringBuffer and StringBuilder are mutable objects. + +You can look at more methods available in string buffer at [official documentation](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/StringBuilder.html). + +Some of the commonly used methods are `toString()`, `insert()`, 'delete()',`append()`, `getChars()` etc. + + +**String Builder Demo** +```java +public class StringBuilderExample { + + static void generateString(){ + String s = ""; + //Adding to String Object + // Inffecient Runs in O(n*n) + for(int i=0; i<100000;i++){ + s = s + (char)('A' + i); //inefficient + } + return s; + } + + static void generateStringUsingSB(){ + StringBuilder sb = new StringBuilder(); + //Efficient + //Runs in O(N) + for(int i=0; i<100000;i++){ + sb.append((char)('A' + i)); //efficient + } + return sb.toString(); + } + + public static void main(String[] args) { + //you can do a time comparison for both + long start = System.currentTimeMillis(); + generateStringUsingSB(); + long end = System.currentTimeMillis(); + System.out.println(end-start); + } + +``` + + +**String Buffer vs String Builder** +StringBuffer is synchronized and therefore thread-safe. StringBuilder is compatible with StringBuffer API but with no guarantee of synchronization.Because it’s not a thread-safe implementation, it is faster and it is recommended to use it in places where there’s no need for thread safety + + +Simply put, the StringBuffer is a thread-safe implementation and therefore slower than the StringBuilder. In single-threaded programs, we can take of the StringBuilder. Yet, the performance gain of StringBuilder over StringBuffer may be too small to justify replacing it everywhere. + + diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Thread Lifecycle - Complete.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Thread Lifecycle - Complete.md new file mode 100644 index 0000000..03c27c8 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Java Thread Lifecycle - Complete.md @@ -0,0 +1,283 @@ +### Threads in Java Recap +In the Java, multithreading is driven by the core concept of a Thread. Lets recap some logic that runs in a parallel thread by using the Thread framework. In the below code example we are creating two threads and running them in parallel. + +**Using Thread Class** +```java +public class NewThread extends Thread { + public void run() { + // business logic + ... + } + } +} +``` +Class to initialize and start our thread. + +```java +public class MultipleThreadsExample { + public static void main(String[] args) { + NewThread t1 = new NewThread(); + t1.setName("MyThread-1"); + NewThread t2 = new NewThread(); + t2.setName("MyThread-2"); + t1.start(); + t2.start(); + } +} +``` + +**Using Runnable** +```java +class SimpleRunnable implements Runnable { + public void run() { + // business logic + } +} +``` +The above SimpleRunnable is just a task which we want to run in a separate thread. +There’re various approaches we can use for running it; one of them is to use the Thread class: + +```java +public void test(){ + Thread thread = new Thread(new SimpleRunnable()); + thread.start(); + thread.join(); +} +``` + +Simply put, we generally encourage the use of Runnable over Thread: + +When extending the Thread class, we’re not overriding any of its methods. Instead, we override the method of Runnable (which Thread happens to implement). +- This is a clear violation of IS-A Thread principle +- Creating an implementation of Runnable and passing it to the Thread class utilizes composition and not inheritance – which is more flexible +- After extending the Thread class, we can’t extend any other class +- From Java 8 onwards, Runnables can be represented as lambda expressions + +### Thread Life Cycle +During thread lifecycle, threads go through various states. The `java.lang.Thread` class contains a static State enum – which defines its potential states. During any given point of time, the thread can only be in one of these states: + +- **NEW** – a newly created thread that has not yet started the execution +- **RUNNABLE** – either running or ready for execution but it’s waiting for resource allocation +- **BLOCKED** – waiting to acquire a monitor lock to enter or re-enter a synchronized block/method +- **WAITING** – waiting for some other thread to perform a particular action without any time limit +- **TIMED_WAITING** – waiting for some other thread to perform a specific action for a specified period +- **TERMINATED** – has completed its execution + +#### 1.NEW +A NEW Thread (or a Born Thread) is a thread that’s been created but not yet started. It remains in this state until we start it using the start() method. + +The following code snippet shows a newly created thread that’s in the NEW state: + +```java +Runnable runnable = new NewState(); +Thread t = new Thread(runnable); +System.out.println(t.getState()); +``` + +Since we’ve not started the mentioned thread, the method `t.getState()` prints: +``` +NEW +``` + +### 2. Runnable +When we’ve created a new thread and called the start() method on that, it’s moved from NEW to RUNNABLE state. Threads in this state are either running or ready to run, but they’re waiting for resource allocation from the system. + +In a multi-threaded environment, the Thread-Scheduler (which is part of JVM) allocates a fixed amount of time to each thread. So it runs for a particular amount of time, then leaves the control to other RUNNABLE threads. + +For example, let’s add `t.start()` method to our previous code and try to access its current state: + +```java +Runnable runnable = new NewState(); +Thread t = new Thread(runnable); +t.start(); +System.out.println(t.getState()); +``` + +This code is most likely to return the output as: +``` +RUNNABLE +``` +Note that in this example, it’s not always guaranteed that by the time our control reaches `t.getState()`, it will be still in the RUNNABLE state. + +It may happen that it was immediately scheduled by the Thread-Scheduler and may finish execution. In such cases, we may get a different output. + +### 3. BLOCKED +A thread is in the BLOCKED state when it’s currently not eligible to run. It enters this state when it is waiting for a monitor lock and is trying to access a section of code that is locked by some other thread. + +Let’s try to reproduce this state: +```java +public class BlockedState { + public static void main(String[] args) throws InterruptedException { + Thread t1 = new Thread(new DemoBlockedRunnable()); + Thread t2 = new Thread(new DemoBlockedRunnable()); + + t1.start(); + t2.start(); + + Thread.sleep(1000); //pause so that t2 states changes during this time + System.out.println(t2.getState()); + System.exit(0); + } +} + +class DemoBlockedRunnable implements Runnable { + @Override + public void run() { + commonResource(); + } + + public static synchronized void commonResource() { + while(true) { + // Infinite loop to mimic heavy processing + // 't1' won't leave this method + // when 't2' try to enter this + } + } +} +``` +In this code: + +We’ve created two different threads – t1 and t2, t1 starts and enters the synchronized commonResource() method; this means that only one thread can access it; all other subsequent threads that try to access this method will be blocked from the further execution until the current one will finish the processing. + +When t1 enters this method, it is kept in an infinite while loop; this is just to imitate heavy processing so that all other threads cannot enter this method + +Now when we start t2, it tries to enter the commonResource() method, which is already being accessed by t1, thus, t2 will be kept in the BLOCKED state. +Being in this state, we call `t2.getState()` and get the output as: +``` +BLOCKED +``` + +### 4. WAITING +A thread is in WAITING state when it’s waiting for some other thread to perform a particular action. According to JavaDocs, any thread can enter this state by calling any one of the following three methods: + +- object.wait() +- thread.join() or +- LockSupport.park() + +Note that in wait() and join() – we do not define any timeout period as that scenario is covered in the next section. + + +In this example, thread-1 starts thread 2 and waits for thread-2 to finish using `thread.join()` method. During this time t1 is in `WAITING` state. + +**Simple Runnable.java - Thread 1** +```java +public class SimpleRunnable implements Runnable{ + + @Override + public void run(){ + Thread t2 = new Thread(new SimpleRunnableTwo()); + t2.start(); + try { + t2.join(); + } catch (InterruptedException e) { + e.printStackTrace(); + } + } +} + +``` + +**Simple Runnable 2 - Thread 2** +```java +public class SimpleRunnableTwo implements Runnable { + @Override + public void run() { + try{ + Thread.sleep(5000); + } + catch(InterruptedException e){ + Thread.currentThread().interrupt(); + e.printStackTrace(); + } + + } +} +``` +**Main** +```java +public class Main { + public static void main(String[] args) throws InterruptedException { + Thread t1 = new Thread(new SimpleRunnable()); + t1.start(); + + Thread.sleep(1000); //1ms pause + System.out.println("T1 :"+ t1.getState()); //T1 is waiting state + System.out.println("Main :" + Thread.currentThread().getState()); + } +} +``` + +### 5. TIMED WAITING +A thread is in `TIMED_WAITING` state when it’s waiting for another thread to perform a particular action within a stipulated amount of time. + +According to JavaDocs, there are five ways to put a thread on TIMED_WAITING state: + +- thread.sleep(long millis) +- wait(int timeout) or wait(int timeout, int nanos) +- thread.join(long millis) +- LockSupport.parkNanos +- LockSupport.parkUntil + +Here, we’ve created and started a thread t1 which is entered into the sleep state with a timeout period of 5 seconds; the output will be `TIMED_WAITING`. + +```java +public class SimpleRunnable implements Runnable{ + @Override + public void run() { + try{ + Thread.sleep(5000); + } + catch(InterruptedException e){ + e.printStackTrace(); + } + } +} + +``` +In Main, if you check the state of T1 after 2s it will be `TIMED WAITING` +```java +public class Main { + public static void main(String[] args) throws InterruptedException { + Thread t1 = new Thread(new SimpleRunnable()); + t1.start(); + + Thread.sleep(2000); + System.out.println(t1.getState()); + } +} +``` + + +### 6. TERMINATED +This is the state of a dead thread. It’s in the `TERMINATED` state when it has either finished execution or was terminated abnormally. There are different ways of terminating a thread. + +Let’s try to achieve this state in the following example: +```java +public class TerminatedState implements Runnable { + public static void main(String[] args) throws InterruptedException { + Thread t1 = new Thread(new TerminatedState()); + t1.start(); + + // The following sleep method will give enough time for + // thread t1 to complete + Thread.sleep(1000); + System.out.println(t1.getState()); + } + + @Override + public void run() { + // No processing in this block + + } +} +``` +Here, while we’ve started thread t1, the very next statement Thread.sleep(1000) gives enough time for t1 to complete and so this program gives us the output as: +``` +TERMINATED +``` + + + + + + diff --git a/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Volatile.md b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Volatile.md new file mode 100644 index 0000000..9a974d8 --- /dev/null +++ b/Non-DSA Notes/LLD1 Notes/Miscellaneous Topics/Volatile.md @@ -0,0 +1,54 @@ +# Volatile Keyword +--- + +## Recap of Basics +- Interleaving: When threads start and pause, in the same blocks as other threads, this is called interleaving. +- The execution of multiple threads happens in arbitrary order. The order in which threads execute can't be guranteed. + +## Atomic Action +An action that effectively happens all at once - either it happens completely or doesn't happen at all. + +Even increments and decrements aren't atomic, nor are all primitive assignments. +Example - Long and double assignments may not be atomic on all virtual machines. + +## Thread Safe Code +An object or a block of code is thread-safe, if it is not comprised by execution of concurrent threads. +This means the correctness and consistency of program's output or its visible state, is unaffected by other threads. +Atomic Operations and immutable objects are examples of thread-safe code. + +In real life, there are shared resources which are available to multiple concurrent threads in real time. We have techniques, to control acess to the resources to prevent affects of interleaving threads. These technicques are - +1) Synchronisation/Locking +2) Volatile Keyword + +### Problem 1 - Atomicity/Synchronization +... alreaedy seen ... + + +### Problem 2 - Memory Inconsistency Errors, Data Races +The Operating system may read from heap variables, and make a copy of the value in each thread's own storage. Each threads has its own small and fast memory storage, that holds its own copy of shared resource's value. + +Once thread can modify a shared variable, but this change might not be immediately reflected or visible. Instead it is first update in thread's local cache. The operating system may not flush the first thread's changes to the heap, until the thread has finished executing, causing memory inconsistency errors. + +### Solution - Volatile Keyword +- The volatile keyword is used as modifier for class variables. +- It's an indicator that this variable's value may be changed by multiple threads. +- This modifier ensures that the variable is always read from, and written to the main memory, rather than from any thread-specific cache. +- This provides memory consistency for this variables value across threads. +Volatile doesn't gurantee atomicicty. + +However, volatile does not provide atomicity or synchronization, so additional synchronization mechanisms should be used in conjunction with it when necessary. + +**When to use volatile** +- When a variable is used to track the state of a shared resource, such as counter or a flag. +- When a varaible is used to communicate between threads. + +**When not use volatile** +- When the variable is used by single thread. +- When a variable is used to store a large amount of data. + + + + + + + diff --git a/Non-DSA Notes/SQL Notes/01 Notes_ Intro to DBMS and Relational Model.md b/Non-DSA Notes/SQL Notes/01 Notes_ Intro to DBMS and Relational Model.md new file mode 100644 index 0000000..c205198 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/01 Notes_ Intro to DBMS and Relational Model.md @@ -0,0 +1,328 @@ + +## Agenda + +- What is a Database +- What, What Not, Why, How of Scaler SQL Curriculum +- Types of Databases +- Intro to Relational Databases +- Intro to Keys + + + + + + + +## What is a Database + +In your day to day life whenever you have a need to save some information, where do you save it? Especially when you may need to refer to it later, maybe something like your expenses for the month, or your todo or shopping list? + + + +Many of us use softwares like Excel, Google Sheets, Notion, Notes app etc to keep a track of things that are important for us and we may need to refer to it in future. Everyone, be it humans or organizations, have need to store a lot of data that me useful for them later. Example, let's think about Scaler. At Scaler, we would want to keep track of all of your's attendance, assignments solved, codes written, coins, mentor session etc! We would also need to store details about instructors, mentors, TAs, batches, etc. And not to forget all of your's email, phone number, password. Now, where will we do this? + +For now, forget that you know anything about databases. Imagine yourself to be a new programmer who just knows how to write code in a programming language. Where will you store data so that you are able to retrieve it later and process that? + + + +You will store it in files. You will write code to read data from files, and write data to files. And you will write code to process that data. For example you may create separate CSV (comma separated values, you will understand as we proceed) files to store information about let's say students, instructors, batches. + +--- +Examples of CSV +--- + +``` +students.csv +name, batch, psp, attendance, coins, rank +Naman, 1, 94, 100, 0, 1 +Amit, 2, 81, 70, 400, 1 +Aditya, 1, 31, 100, 100, 2 +``` + +```instructors.csv +name, subjects, average_rating +Rachit, C++, 4.5 +Rishabh, Java, 4.8 +Aayush, C++, 4.9 +``` + +```batches.csv +id, name, start_date, end_date +1, AUG 22 Intermediate, 2022-08-01, 2023-08-01 +2, AUG 22 Beginner, 2022-08-01, 2023-08-01 +``` + +--- +What happens if we want to find average now? Finding average is cumbersome in CSV +--- + +Now, let's say you want to find out the average attendance of students in each batch. How will you do that? You will have to write code to read data from students.csv, and batches.csv, and then process it to find out the average attendance of students in each batch. Right? + + + +# Question +Do you think this will be very cumbersome? + +# Choices +- [ ] Yes +- [ ] No + +--- +Issues with using files as a database +--- + + +Okay, let's think out problems that can happen writing such code. Before that, take a while and think about what all can go wrong? + + +Correct! There are a lot of issues that can happen. Let's discuss these: + + +1. Inefficient + +While the above set of data is very small in size, let's think of actual Scaler scale. We have 2M+ users in our system. Imagine going through a file with 2M lines, reading each line, processing it to find your relevant information. Even a very simple task like finding the psp of a student named Rahul will require you to open the file, read each line, check if the name is Rahul, and then return the psp. Time complexity wise, this is O(N) and very slow. + +2. Integrity + +Is there anyone stopping you from putting a new line in above file `students.csv` as ```Rahul, 1, Hello, 100, 0, 1``` . If you see that `Hello` that is unexpected. The psp can't be a string. But there is no one to validate and this can lead to very bad situations. This is known as data integrity issue, where the data is not as expected. + + +3. Concurrency + +Later in the course, you will learn about multi-threading and multi-processing. It is possible for more than 1 people to query about the same data at the same time. Similarly, 2 people may update the same data at the same time. On save, whose version should you save? Imagine you give same Google Doc to 2 people and both make changes on the same line and send to you. Whose version will you consider to be correct? This is known as concurrency issue. + +4. Security + +Earlier we talked about storing password of users. Imagine them being stored on files. Anyone who has access to the file can see the password of all users. Also anyone who has access to the file can update it as well. There is no authorization at user level. Eg: a particular person may be only allowed to read, not write. + +--- +What's a Database +--- + + +Now let's get back to our main topic. What is a database? A database is nothing but a collection of related data. Example, Scaler will have a Database that stores information about our students, users, batches, classes, instructors, and everything else. Similarly, Facebook will have a database that stores information about all of it's users, their posts, comments, likes, etc. The above way of storing data into files was also nothing but a database, though not the easiest one to use and with a lot of issues. + +Analogy to understand Databases: +--- +- Like we have army personnels at Army base: + +![Screenshot 2024-02-07 at 12.54.55 PM](https://hackmd.io/_uploads/S1JX-hxjp.jpg) +pic credits: Unknown + +--- +- We have airforce personnel at Airbase: + +![Screenshot 2024-02-07 at 12.55.02 PM](https://hackmd.io/_uploads/r1fUWhxiT.jpg) +pic credits: Unknown + +--- +Similarly we have data at Database: +![Screenshot 2024-02-07 at 12.55.11 PM](https://hackmd.io/_uploads/SJwjZhxja.jpg) +pic credits: Unknown + + + + + + +--- +What's DBMS +--- + +### What's a Database Management System (DBMS) + + +A DBMS as the name suggests is a software system that allows to efficiently manage a database. A DBMS allows us to create, retrieve, update, and delete data (often called CRUD operations). It also allows to define rules to ensure data integrity, security, and concurrency. It also provides ways to query the data in the database efficiently. + +Eg: find all students with psp > 50, find all students in batch 1, find all students with rank 1 in their batch, etc. +There are many database management systems, each with their tradeoffs. We will talk about the types of databases later. + + + + + +--- +Types of Databases +--- + + +Welcome back after the break. Hope you had a good rest and had some water, etc. Now let's start with the next topic for the day and discuss different types of databases that exist. Okay, tell me one thing, when you have to store some data, for example, let's say you are an instructor at Scaler and want to keep a track of attendance and psp of every student of you, in what form will you store that? + + +Correct! Often one of the easiest and most intuitive way to store data can be in forms of tables. Example for the mentioned use case, we may create a table with 3 columns: name, attendance, psp and fill values for each of my students there. This is very intuitive and simple and is also how relational databases work. + +Databases can be broadly divided into 2 categories: +1. Relational Databases +2. Non-Relational Databases + +### Relational Databases + +Relational Databases allow you to represent a database as a collection of multiple related tables. Each table has a set of columns and rows. Each row represents a record and each column represents a field. Example, in the above case, we may have a table with 3 columns: name, attendance, psp and fill values for each of my students there. Let's learn some properties of relational databases. + +### Non-Relational Databases + +Now that we have learnt about relational databases, let's talk about non-relational databases. Non-relational databases are those databases that don't follow the relational model. They don't store data in form of tables. Instead, they store data in form of documents, key-value pairs, graphs, etc. In the DBMS module, we will not be talking about them. We will talk about them in the HLD Module. + +In the DBMS module, our goal is to cover the working of relational databases and how to work with them, that is via SQL queries. + + + +--- +Property of RDBMS - 1 +--- + +1. Relational Databases represent a database as a collection of tables with each table storing information about something. This something can be an entity or a relationship between entities. Example: We may have a table called students to store information about students of a batch (an entity). Similarly we may have a table called student_batches to store information about which student is in which batch (a relationship betwen entities). + +--- +Property of RDBMS - 2 +--- + +2. Every row is unique. This means that in a table, no 2 rows can have same values for all columns. Example: In the students table, no 2 students can have same name, attendance and psp. There will be something different for example we might also want to store their roll number to distingusih 2 students having the same name. + +--- +Property of RDBMS - 3 +--- + +3. All of the values present in a column hold the same data type. Example: In the students table, the name column will have string values, attendance column will have integer values and psp column will have float values. It cannot happen that for some students psp is a String. + +--- +Property of RDBMS - 4 +--- + +4. Values are atomic. What does atomic mean? What does the word `atom` mean to you? + + + +Correct. Similarly, atomic means indivisible. So, in a relational database, every value in a column is indivisible. Example: If we have to store multiple phone numbers for a student, we cannot store them in a single column as a list. How to store those, we will learn in the end of the course when we do Schema Design. Having said that, there are some SQL databases that allow you to store list of values in a column. But that is not a part of SQL standard and is not supported by all databases. Even those that support, aren't most optimal with queries on such columns. + +--- +Property of RDBMS - 5 +--- + +5. The columns sequence is not guaranteed. This is very important. SQL standard doesn't guarantee that the columns will be stored in the same sequence as you define them. So, if you have a table with 3 columns: name, attendance, psp, it is not guaranteed that the data will be stored in the same sequence. So it is recommended to not rely on the sequence of columns and always use column names while writing queries. While MySQL guaranteees that the order of columns shall be same as defined at time of creating table, it is not a part of SQL standard and hence not guaranteed by all databases and relying on order can cause issues if in future a new column is added in between. + +--- +Property of RDBMS - 6 +--- + +6. The rows sequence is not guaranteed. Similar to columns, SQL doesn't guarantee the order in which rows shall be returned after any query. So, if you want to get rows in a particular order, you should always use `ORDER BY` clause in your query which we will learn about in the next class. So when you write a SQL query, don't assume that the first row will always be the same. The order of rows may change across multiple runs of same query. Having said that, MySQL does return rows in order of their primary key (we will learn about this later on), but again, don't rely on that as not guaranteed by SQL standard. + +--- +Property of RDBMS - 7 +--- + +7. The name of every column is unique. This means that in a table, no 2 columns can have same name. Example: In the students table, we cannot have 2 columns with name `name`. This is because if I have to write a query to get the name of a student, we will have to write `SELECT name FROM students`. Now if there are 2 columns with name `name`, how will the database know which one to return? Hence, the name of every column is unique. + +--- +Keys in Relational Databases +--- + + + +Now we are moving to probably the most important foundational concept of Relational Databases: Keys. let's say you are working at Scaler and are maintaining a table of every students' details. Someone tells you to update the psp of Rahul to 100. How will you do that? What can go wrong? + + +Correct. If there are 2 Rahuls, how will you know which one to update? This is where keys come into picture. Keys are used to uniquely identify a row in a table. There are 2 important types of keys: +1. Primary Key and +2. Foreign Key. + +There are also other types of keys like Super Key, Candidate Key etc. Let's learn about them one by one. + +--- +Super Keys +--- + + +To understand this, let's take an example of a students table at scaler with following columns. + +| name | psp | email | batch | phone number | +| - | - |- | - | - | +|Rahul | 1 | 94 | 100 | 0 | 1 | +|Amit | 2 | 81 | 70 | 400 | 1 | +|Aditya| 1| 31| 100| 100| 2 | + +Which are the columns that can be used to uniquely identify a row in this table? + + + +--- +Can name be Super Key? +--- + +Let's start with name. Do you think name can be used to uniquely identify a row in this table? + + + +Correct. Name is not a good idea to recognize a row. Why? Because there can be multiple students with same name. So, if we have to update the psp of a student, we cannot use name to uniquely identify the student. Email, phone number on the other hand are a great idea, assuming no 2 students have same email, or same phone number. + +--- +Can a combination of columns be Super Key? +--- + +Do you think the value of combination of columns (name, email) can uniquely identify a student? Do you think there will be only 1 student with a particular combination of name and email. Eg: will there be only 1 student like (Rahul, rahul@scaler.com)? + + +Correct, similarly do you think (name, phone number) can uniquely identify a student? What about (name, email, phone number)? What about (name, email, psp)? What about (email, psp)? + +The answer to each of the above is Yes. Each of these can be considered a `Super Key`. A super key is a combination of columns whose values can uniquely identify a row in a table. What do you think are other such super keys in the students table? + + +In the above keys, did you ever feel something like "but this column was useless to uniquely identify a row.." ? Let's take example of (name, email, psp). Do you think psp is required to uniquely identify a row? Similarly, do you think name is required as you anyways have email right? This means a Super key can have redundant/extra columns. + +--- +Time for a few quizzes +--- + + + +--- +Quiz 1 +--- + + +Which of the following is a Super Key for the Student table? +> Consider StudentID to be unique in students table. + +### Choices + +- [ ] {StudentID, CourseID} +- [ ] {FirstName, LastName} +- [ ] {Age, CourseName} +- [ ] {LastName, CourseID} + +--- +Quiz 2 +--- + + + +Which of these combinations could also be a Super Key for the Student table? +> Consider StudentID to be unique in students table. + +### Choices + +- [ ] {StudentID, CourseName} +- [ ] {FirstName, Age} +- [ ] {LastName, Age} +- [ ] {CourseID, CourseName} + + +--- +Quiz 3 +--- + +Given the uniqueness of the StudentID, which of these could be a potential Super Key for the Student table? + +### Choices + +- [ ] {StudentID, FirstName} +- [ ] {StudentID, Age} +- [ ] {StudentID, LastName} +- [ ] All of the above + + + +> Answers for Quizzes: +> 1. Option 1 +> 2. Option 1 +> 3. Option 4 \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/02 Notes_ Keys .md b/Non-DSA Notes/SQL Notes/02 Notes_ Keys .md new file mode 100644 index 0000000..1655ad5 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/02 Notes_ Keys .md @@ -0,0 +1,423 @@ + + +## Agenda + +- Keys + - Candidate key + - Primary key + - Composite key + - Foreign key +- Introduction to SQL + +--- +Candidate Keys +--- + + +Now let's re-consider **Super Keys**. let's remove the columns that weren't necessary in Super Keys. + +Also, let's say we were an offline school, and students don't have email or phone number. In that case, what do you think schools use to uniquely identify a student? Eg: If we remove redundant columns from (name, email, psp), we will be left with (email). Similarly, if we remove redundant columns from (name, email, phone number), we will be left with (phone number) or (email). These are known as `candidate keys`. + +***A candidate key is a super key from which no column can be removed and still have the property of uniquely identifying a row.*** +If any more column is removed from a candidate key, it will no longer be able to uniquely identify a row. + +--- +Example of Candidate Keys using a table +--- + +Let's take another example. Consider a table Scaler has for storing student's attendance for every class. + +`batches` + +| student_id | class_id | attendance | +|----------|------------|------------| +| 1 | 2 | 100 | +| 1 | 3 | 90 | +| 2 | 2 | 89 | +| 2 | 5 | 100 | +| 2 | 3 | 87 | + + +| student_id | class_id | attendance | + +What do you think are the candidate keys for this table? Do you think (student_id) is a candidate key? Will there be only 1 row with a particular student_id? The student can attend multiple classes at Scaler ex: DBMS1, Keys etc.. in all these cases student_id is same for that particular Student hence it is not unique. + + + +Is (class_id) a candidate key? Will there be only 1 row with a particular class_id? Multiple students can attend multiple classes at Scaler ex: DBMS is having a class_id and multiple students can attend that class hence it is not unique. + + + +Is (student_id, class_id) a candidate key? Will there be only 1 row with a particular combination of student_id and class_id? Yes, a student can attend a class only one time example: Rahul can attend class DBMS once only hence this combination is going to be unique. + +Yes! (student_id, class_id) is a candidate key. If we remove any of the columns of this, the remanining part is not a candidate key. Eg: If we remove student_id, we will be left with (class_id). But there can be multiple rows with same class_id. Similarly, if we remove class_id, we will be left with (student_id). But there can be multiple rows with same student_id. Hence, (student_id, class_id) is a candidate key. + +> Activity: Please try to make these pairs from table above to verify this concept. + +Is (student_id, class_id, attendance) a candidate key? Will there be only 1 row with a particular combination of student_id, class_id and attendance? + + +Yes there is only one row, but can we remove any column from this and still have a candidate key? Eg: If we remove attendance, we will be left with (student_id, class_id). This is a candidate key. Hence, (student_id, class_id, attendance) is not a candidate key. + +Now let's have few quizzes: + +--- +Quiz 1 +--- + +Is a candidate key always a super key? + +### Choices + +- [ ] Yes +- [ ] No + +--- +Quiz 2 +--- + +Is a super key always a candidate key? + +### Choices + +- [ ] Yes +- [ ] No + + +--- +Quiz 3 +--- + +Which of the following is a Candidate Key for the Employee table? + +### Choices + +- [ ] {EmployeeID, Department} +- [ ] {Email} +- [ ] {FirstName, LastName} +- [ ] {LastName, Department} + +--- +Quiz 4 +--- + +If both EmployeeID and Email are unique for each employee, which of these could be a Candidate Key for the Employee table? + +### Choices + +- [ ] {EmployeeID, Email} +- [ ] {EmployeeID} +- [ ] {Email} +- [ ] Both B and C + +--- +Quiz 5 +--- + +Which of these combinations is NOT a Candidate Key for the Employee table? + +### Choices + +- [ ] {EmployeeID} +- [ ] {Email} +- [ ] {LastName, Department} + +--- +Primary Key +--- + +### Primary Key + +We just learnt about super keys and candidate keys. Can 1 table have mulitiple candidate keys? Yes. The Student's table earlier had both `email`, `phone number` as candidate keys. A key in MySQL plays a very important role. Example, MySQL orders the data in disk by the key. Similarly, by default, it returns answers to queries ordered by key. Thus, it is important that there is only 1 key. And that is called primary key. A primary key is a candidate key that is chosen to be the key for the table. In the students table, we can choose `email` or `phone number` as the primary key. Let's choose `email` as the primary key. + +> Note: Internally, +> 1. Database sorts the data by primary key. +> 2. Database outputs the result of every query sorted by primary key. +> 3. Database creates an index as well on primary key. + + +Sometimes, we may have to or want to create a new column to be the primary key. Eg: If we have a students table with columns (name, email, phone number), we may have to create a new column called roll number or studentId to be the primary key. This may be because, let's say, a user can change their email or phone number if they want. Something that is used to uniquely identify a row should ideally never change. Hence, we create a new column called roll number or studentId to be the primary key. + +> A good primary key should: +> 1. be fast to sort on. +> 2. have smaller size (to reduce the space required for behind the scene indexing). +> 3. not get changed. + +Therefore, it is preferred to have a primary key with single integer column. + + +We will see later on how MySQL allows to create primary keys etc. Before we go to foreign keys and composite keys, let's actually get our hands dirty with SQL, post that it will be easy to understand how to create a PK. + +Now let's have a quizz: + +--- +Quiz 6 +--- + +Which of the following can be a good PK in students table? + +### Choices + +- [ ] {Email} +- [ ] {Email, Phone_number} +- [ ] {Phone_number} +- [ ] {Student_Id} + +--- +Introduction to SQL +--- + + + +First of all, what is SQL? SQL stands for Structured Query Language. It is a language used to interact with relational databases. It allows you to create tables, fetch data from them, update data, manage user permissions etc. Today we will just focus on creation of data. Remaining things will be covered over the coming classes. Why "Structured Query" because it allows to query over data arranged in a structured way. Eg: In Relational databases, data is structured into tables. + +### Create table in MySQL + +A simple query to create a table in MySQL looks like this: + +```sql +CREATE TABLE students ( + id INT AUTO_INCREMENT, + firstName VARCHAR(50) NOT NULL, + lastName VARCHAR(50) NOT NULL, + email VARCHAR(100) UNIQUE NOT NULL, + dateOfBirth DATE NOT NULL, + enrollmentDate TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + psp DECIMAL(3, 2) CHECK (psp BETWEEN 0.00 AND 100.00), + batchId INT, + isActive BOOLEAN DEFAULT TRUE, + PRIMARY KEY (id), +); + +-- We can add Primary key separately like in this query. +``` + + +Here we are creating a table called students. +- Inside brackets, we mention the different columns that this table has. Along with each columns, we mention the data type of that column. Eg: firstName is of type VARCHAR(50). +Please do watch the video on SQL Data Types attached at bottom of notes to understand what VARCHAR, TIMESTAMP etc. means. For this discussion, it suffices to know that these are different data types supported by MySQL. +- After the data type, we mention any constraints on that column. Eg: NOT NULL means that this column cannot be null. +In next notes when we will learn how to insert data, if we try to not put a value of this column, we will get an error. +- UNIQUE means that this column cannot have duplicate values. If we insert a new row in a table, or update an existing row that leads to 2 rows having same value of this column, the query will fail and we will get an error. +- DEFAULT specifies that if no value is provided for this column, it will take the given value. Example, for enrollmentDate, it will take the value of current_timestamp, which is the time when you are inserting the row. +- CHECK (psp BETWEEN 0.00 AND 100.00) means that the value of this column should be between 0.00 and 100.00. If some other value is inserted, the query will fail. + +- Resources for Data_Type videos is shared at bottom of these notes. +- More on SQL constraints: https://www.scaler.com/topics/sql/constraints-in-sql/ + + + + +--- +Types of SQL Commands +--- + +> Based on kind of work a SQL query does we have categorised them into following types: + +- DDL(Data Definition Language): To make/perform changes to the physical structure of any table residing inside a database, DDL is used. These commands when executed are auto-commit in nature and all the changes in the table are reflected and saved immediately. + +- DML(Data Manipulation Language): Once the tables are created and the database is generated using DDL commands, manipulation inside those tables and databases is done using DML commands. The advantage of using DML commands is, that if in case any wrong changes or values are made, they can be changed and rolled back easily. + +- DQL(Data Query Language): Data query language consists of only one command upon which data selection in SQL relies. The SELECT command in combination with other SQL clauses is used to retrieve and fetch data from databases/tables based on certain conditions applied by the user. + +- DCL(Data Control Language): DCL commands as the name suggests manage the matters and issues related to the data controller in any database. DCL includes commands such as GRANT and REVOKE which mainly deal with the rights, permissions, and other controls of the database system. + +- TCL(Transaction Control Language): Transaction Control Language as the name suggests manages the issues and matters related to the transactions in any database. They are used to roll back or commit the changes in the database. + + +- Based on above findings following are the examples of SQL commands. + +> ![Screenshot 2024-02-08 at 11.06.07 AM](https://hackmd.io/_uploads/BJaUu1Gsp.png) + +- For more detailed analysis please refer to Scaler Topic's aticle: https://www.scaler.com/topics/dbms/sql-commands/ + + + +--- +Composite Keys +--- + + +Now, lets's get into composite key. + +A composite key is a key with more than one column. Any key with multiple columns (a collection of columns) is a composite key. + +> Note: Super, Candidate and Primary keys can be of both type - either a single key or a composite key. + +--- +Foreign Keys +--- + +### Foreign Keys + +Now let's get to the last topic of the day. Which is foreign keys. Let's say we have a table called batches which stores information about batches at Scaler. It has columns (id, name). We would want to know for every student, which batch do they belong to. How can we do that? + +| batch_id | batch_name | +|----------|------------| +| 1 | Batch A | +| 2 | Batch B | +| 3 | Batch C | + + +| student_id | first_name | last_name | +|------------|------------|-----------| +| 1 | John | Doe | +| 2 | Jane | Doe | +| 3 | Jim | Brown | +| 4 | Jenny | Smith | +| 5 | Jack | Johnson | + + +Correct, We can add batchId column in students table. But how do we know which batch a student belongs to? How do we ensure that the batchId we are storing in the students table is a valid batchId? What if someone puts the value in batchID column as 4 but there is no batch with id 4 in batches table. We can set such kind of constraints using foreign keys. **A foreign key is a column in a table that references a column in another table.** It has nothing to do with primary, candidate, super keys. It can be any column in 1 table that refers to any column in other table. In our case, batchId is a foreign key in the students table that references the id column in the batches table. This ensures that the batchId we are storing in the students table is a valid batchId. If we try to insert any value in the batchID column of students table that isn't present in id column of batches table, it will fail. Another example: + +Let's say we have `years` table as: +`| id | year | number_of_days |` + +and we have a table students as: +`| id | name | year |` + +Is `year` column in students table a foreign key? + + + +The correct answer is yes. It is a foreign key that references the id column in years table. Again, foreign key has nothing to do with primary key, candidate key etc. It is just any column on one side that references another column on other side. Though often it doesn't make sense to have that and you just keep primary key of the other table as the foreign key. If not a primary key, it should be a column with unique constraint. Else, there will be ambiguities. + +Okay, now let's think of what can go wrong with foreign keys? + + + +Correct, let's say we have students and batches tables as follows: + +| batch_id | batch_name | +|----------|------------| +| 1 | Batch A | +| 2 | Batch B | +| 3 | Batch C | + + +| student_id | first_name | last_name | batch_id | +|------------|------------|-----------|----------| +| 1 | John | Doe | 1 | +| 2 | Jane | Doe | 1 | +| 3 | Jim | Brown | 2 | +| 4 | Jenny | Smith | 3 | +| 5 | Jack | Johnson | 2 | + +Now let's say we delete the row with batch_id 2 from batches table. What will happen? Yes, the students Jim and Jack will be orphaned. They will be in the students table but there will be no batch with id 2. This is called orphaning. This is one of the problems with foreign keys. Another problem is that if we update the batch_id of a batch in batches table, it will not be updated in students table. Eg: If we update the batch_id of Batch A from 1 to 4, the students John and Jane will still have batch_id as 1. This is called inconsistency. + +To fix for these, MySQL allows you to set ON DELETE and ON UPDATE constraints when creating a foreign key. You can specify what should happen in case an update or a delete happens in the other table. What do you think are different possibilities of what we can do if a delete happens? + + + +You can set 4 values for ON DELETE and ON UPDATE. They are: +1. CASCADE: If the referenced data is deleted or updated, all rows containing that foreign key are also deleted or updated. +2. SET NULL: If the referenced data is deleted or updated, the foreign key in all rows containing that foreign key is set to NULL. This assumes that the foreign key column is not set to NOT NULL. +3. NO ACTION: If the referenced data is deleted or updated, MySQL will not execute the delete or update operation for the parent table. This is the default action. +4. SET DEFAULT: If the referenced data is deleted or updated, the foreign key in all the referencing rows is set to its default values. This is only functional with tables that use the InnoDB engine and where the foreign key column(s) have not been defined to have a NOT NULL attribute. + +--- +Practical example - Foreign Keys +--- + +Now let's see how to create a table with a foreign key. Let's say we want to create a table called students with columns (id, name, batch_id). We want batch_id to be a foreign key that references the id column in batches table. We want that if a batch is deleted, all students in that batch should also be deleted. We can do that as follows: + +```sql +-- Creating 'batches' table +CREATE TABLE batches ( + batch_id INT PRIMARY KEY, + batch_name VARCHAR(50) NOT NULL +); + +-- Inserting dummy data into 'batches' table +INSERT INTO batches(batch_id, batch_name) VALUES +(1, 'Batch A'), +(2, 'Batch B'), +(3, 'Batch C'); + +-- Creating 'students' table with ON DELETE and ON UPDATE constraints +CREATE TABLE students ( + student_id INT AUTO_INCREMENT PRIMARY KEY, + first_name VARCHAR(50) NOT NULL, + last_name VARCHAR(50) NOT NULL, + batch_id INT, + FOREIGN KEY (batch_id) REFERENCES batches(batch_id) ON DELETE CASCADE ON UPDATE CASCADE +); + +-- Inserting dummy data into 'students' table +INSERT INTO students(first_name, last_name, batch_id) VALUES +('John', 'Doe', 1), +('Jane', 'Doe', 1), +('Jim', 'Brown', 2), +('Jenny', 'Smith', 3), +('Jack', 'Johnson', 2); +``` + +Now, let's try to delete a batch and see what happens: + +```sql +DELETE FROM batches WHERE batch_id = 1; +``` +Answer: It will delete the row from the `batches` table where the `batch_id` is 1. Since the `batch_id` column in the `batches` table is defined as the primary key, which uniquely identifies each row, this query will delete the specific batch with the ID of 1. + +Additionally, due to the foreign key constraint defined in the `students` table with the `ON DELETE CASCADE` option, all associated rows in the `students` table with the matching `batch_id` will also be deleted. In this case, the students John Doe and Jane Doe (who belong to Batch A) will also be deleted. + +Now, let's see what happens if we update a batch: + +```sql +UPDATE batches SET batch_id = 4 WHERE batch_id = 2; +``` + +Answer: It will update the `batch_id` to 4 in the `batches` table where the value was 2. Since `batch_id` is the primary key of the `batches` table, this update will modify the specific row with the ID of 2. + +Since the `batch_id` column is referenced as a foreign key in the `students` table, updating the `batch_id` in the `batches` table will also update the corresponding value in the `students` table due to the `ON UPDATE CASCADE` option specified in the foreign key constraint. Therefore, any students associated with Batch B (which had the original batch_id of 2) will now be associated with Batch 4. + +We can also add foreign keys to a table after the table has been created by using the ALTER command. Let's look at the syntax: + +```sql +ALTER TABLE table_name +ADD FOREIGN KEY (column_name) +REFERENCES other_table(column_in_other_table); + +``` +--- +Data Types in SQL +--- + +What are **Data Types in SQL**? + +A data type is a property that describes the sort of data that an object can store, such as **integer data**, **character data**, **monetary data**, **date and time data**, **binary strings**, and so on. + +> MySQL String Data Types +> ![Screenshot 2024-02-08 at 1.12.05 PM](https://hackmd.io/_uploads/HkMlIZMiT.png) + +> ![Screenshot 2024-02-08 at 1.13.13 PM](https://hackmd.io/_uploads/r1OX8-Gj6.png) + + +--- +> MySQL Numeric Data Types +>![Screenshot 2024-02-08 at 1.14.51 PM](https://hackmd.io/_uploads/BkuFL-zi6.png) + +>![Screenshot 2024-02-08 at 1.18.59 PM](https://hackmd.io/_uploads/H1AdDZGiT.png) + + + + + +Detailed article on data types in SQL available at Scaler topics: https://www.scaler.com/topics/sql/sql-data-types/ + + +Video link MySQL Data Types: https://drive.google.com/file/d/1GHeBM4nEB-CCZ3SMbRJxhjIwrZ_2TAZx/view + + +--- +Solution to Quizzes: +--- + +> -- +Quiz1: Option A (Yes) +Quiz2: Option B (No) +Quiz3: Option B {Email} +Quiz4: Option D (Both B and C) +Quiz5: Option C {LastName, Department} +Quiz5: Option D {Student_Id} +-- diff --git a/Non-DSA Notes/SQL Notes/03 Notes CRUD 1.md b/Non-DSA Notes/SQL Notes/03 Notes CRUD 1.md new file mode 100644 index 0000000..ea5ec80 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/03 Notes CRUD 1.md @@ -0,0 +1,508 @@ + +## Agenda + +- What is CRUD? +- Sakila Database Walkthrough +- CRUD + - Create + - Read + - Selecting Distinct Values + - Select statement to print a constant value + - Operations on Columns + - Inserting Data from Another Table + - WHERE Clause + - AND, OR, NOT + - IN Operator + +> Remaining topics will be covered in next lecture. + +--- +What is CRUD +--- + + + +Today we are going to start the journey of learning MySQL queries by learning about CRUD Operations. Let's say there is a table in which we are storing information about students. What all can we do in that table or its entries? + + +Primarily, on any entity stored in a table, there are 4 operations possible: + +1. Create (or inserting a new entry) +2. Read (fetching some entries) +3. Update (updating information about an entry already stored) +4. Delete (deleting an entry) + +Today we are going to discuss about these operations in detail. Understand that read queries can get a lot more complex, involving aggregate functions, subqueries etc. + +We will be starting with learning about Create, then go to Read, then Update and finally Delete. So let's get started. For this class as well as most of the classes ahead, we will be using Sakila database, which is an official sample database provided by MySQL. + + + + +--- +Sakila Database Overview +--- + + + + +Let us give you all a brief idea about what Sakila database represents so that it is easy to relate to the conversations that we shall have around this over the coming weeks. Sakila database represents a digital video rental store, assume an old movie rental store before Netflix etc. came. It's designed with functionality that would allow for all the operations of such a business, including transactions renting films, managing inventory, and storing customer and staff information. Example: it has tables regarding films, actors, customers, staff, stores, payments etc. You will get more familiar with this in the coming notes, don't worry! + + +> Note: Please download these following in the same order as mentioned here: +> +**MYSQL Community Server Download Link:** https://dev.mysql.com/downloads/mysql/ +**MYSQL workbench Download Link:** https://dev.mysql.com/downloads/workbench/ +**Sakila Download Link:** https://dev.mysql.com/doc/index-other.html +**How to add Sakila Database:** https://drive.google.com/file/d/1eiHtEwGr6r0qWlVpjzYefgP-rPG6DSbv/view?usp=sharing +**Overall Doc containing steps for MYSQL setup:** https://drive.google.com/file/d/1gJ2W4HFY6YxYMX1xtjOyKOefW93WoO-y/view + +--- +Create new entries using Insert +--- + + +Now let's start with the first set of operation for the day: The Create Operation. As the name suggests, this operation is used to create new entries in a table. Let's say we want to add a new film to the database. How do we do that? + +`INSERT` statement in MySQL is used to insert new entries in a table. Let's see how we can use it to insert a new film in the `film` table of Sakila database. + +```sql +INSERT INTO film (title, description, release_year, language_id, rental_duration, rental_rate, length, replacement_cost, rating, special_features) +VALUES ('The Dark Knight', 'Batman fights the Joker', 2008, 1, 3, 4.99, 152, 19.99, 'PG-13', 'Trailers'), + ('The Dark Knight Rises', 'Batman fights Bane', 2012, 1, 3, 4.99, 165, 19.99, 'PG-13', 'Trailers'), + ('The Dark Knight Returns', 'Batman fights Superman', 2016, 1, 3, 4.99, 152, 19.99, 'PG-13', 'Trailers'); +``` +> Note: MySQL queries are not case sensitive. + +Let's dive through the syntax of the query. First we have the `INSERT INTO` clause, which is used to specify the table in which we want to insert the new entry. Then we have the column names in the brackets, which are the columns in which we want to insert the values. Then we have the `VALUES` clause, which is used to specify the values that we want to insert in the columns. The values are specified in the same order as the columns are specified in the `INSERT INTO` clause. So the first value in the `VALUES` clause will be inserted in the first column specified in the `INSERT INTO` clause, and so on. + +--- +Create - About column names in INSERT query +--- + +A few things to note here: + +The column names are optional. If you don't specify the column names, then the values will be inserted in the columns in the order in which they were defined at the time of creating the table. Example: in the above query, if we don't specify the column names, then the values will be inserted in the order `film_id`, `title`, `description`, `release_year`, `language_id`, `original_language_id`, `rental_duration`, `rental_rate`, `length`, `replacement_cost`, `rating`, `special_features`, `last_update`. So the value `The Dark Knight` will be inserted in the `film_id` column, `Batman fights the Joker` will be inserted in the `title` column and so on. + - This is not a good practice, as it makes the query prone to errors. So always specify the column names. + - This makes writing queries tedious, as while writing query you have to keep a track of what column was where. And even a small miss can lead to a big error. + - If you don't specify column names, then you have to specify values for all the columns, including `film_id`, `original_language_id` and `last_update`, which we may want to keep `NULL`. + +Anyways, an example of a query without column names is as follows: + +```sql +INSERT INTO film +VALUES (default, 'The Dark Knight', 'Batman fights the Joker', 2008, 1, NULL, 3, 4.99, 152, 19.99, 'PG-13', 'Trailers', default); +``` + +NULL is used to specify that the value of that column should be `NULL`, and `default` is used to specify that the value of that column should be the default value specified for that column. Example: `film_id` is an auto-increment column, so we don't need to specify its value. So we can specify `default` for that column, which will insert the next auto-increment value in that column. + +So that's pretty much all that's there about Create operations. There is 1 more thing about insert, which is how to insert data from one table to another, but we will talk about that after talking about read. + + + +Before we start with read operations, let us have 2 small Quiz questions for you. + +--- +Quiz 1 +--- + + +What is the correct syntax to insert a new record into a MySQL table? + +### Choices + +- [ ] INSERT INTO table_name VALUES (value1, value2, value3,...); +- [ ] INSERT INTO table_name (value1, value2, value3,...); +- [ ] INSERT VALUES (value1, value2, value3,...) INTO table_name; +- [ ] INSERT (value1, value2, value3,...) INTO table_name; + +--- +Quiz 2 +--- + +How do you insert a new record into a specific column (e.g., 'column1') in a table (e.g., 'table1')? + +### Choices + +- [ ] INSERT INTO table1 column1 VALUES (value1); +- [ ] INSERT INTO table1 (column1) VALUES (value1); +- [ ] INSERT VALUES (value1) INTO table1 (column1); +- [ ] INSERT (column1) VALUES (value1) INTO table1; + +--- +Read +--- + + +Now let's get to the most interesting, and also maybe most important part of today's session: Read operation. `SELECT` statement is used to read data from a table.`Select` command is similar to print statements in other languages. Let's see how we can use it to read data via different queries on the `film` table of Sakila database(Do writing this query once by yourself). A basic select query is as follows: + +```sql +SELECT * FROM film; +``` + +However using above query isn't considered a very good idea. `Select *` have it's own downsides such as `Unnecessary I/O`, `Increased Network Traffic`, `Dependency on Order of Columns on ResultSet`, `More Application Memory`. + +*More on why using `Selec *` isn't good:* https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1 + + +Now try following query and guess the output before trying on workbench: + +```sql +SELECT 'Hello world!'; +``` + + +Here, we are selecting all the columns from the `film` table. The `*` is used to select all the columns. This query will give you the value of each column in each row of the film table. If we want to select only specific columns, then we can specify the column names instead of `*`. Example: + +```sql +SELECT title, description, release_year FROM film; +``` + +Here we are selecting only the `title`, `description` and `release_year` columns from the `film` table. Note that the column names are separated by commas. Also, the column names are case-insensitive, so `title` and `TITLE` are the same. For example, following query would have also given the same result: + +```sql +SELECT TITLE, DESCRIPTION, RELEASE_YEAR FROM film; +``` + +Furthermore, if we want to have `title` column as 'film_name' and `film_id` as 'id' then we can use `as` keyword. This keyword is used to rename a column or table with an alias. It is temporary and only lasts until the duration of that particular query. For example: + +```sql +SELECT title as film_name, film_id as id +FROM film; +``` + +--- +Selecting Distinct Values +--- + +Now, let's learn some nuances around the `SELECT` statement. + + + +Let's say we want to select all the distinct values of the `rating` column from the `film` table. How do we do that? We can use the `DISTINCT` keyword to select distinct values. Example: + +```sql +SELECT DISTINCT rating FROM film; +``` + +This query will give you all the distinct values of the `rating` column from the `film` table. Note that the `DISTINCT` keyword, as all other keywords in MySQL, is case-insensitive, so `DISTINCT` and `distinct` are the same. + +We can also use the `DISTINCT` keyword with multiple columns. Example: + +```sql +SELECT DISTINCT rating, release_year FROM film; +``` + +This query will give you all the distinct set of values of the `rating` and `release_year` columns from the `film` table. Try writing this query once by yourself. + +> Note: DISTINCT keyword must be before all the column names, as it will find the unique values for the collection of column values. For example, if there are 2 column names, then it will find **distinct pairs** among the corresponding values from both the columns. + +Example: + +```sql= + -- The Distinct keyword must only be used before the first column in the query +SELECT rating, DISTINCT release_year FROM film; +``` + +- The following picture shows the error which occurs when we don't use `DISTINCT` as first keyword after `Select`. +- This above query wants to print all ratings but distinct release_years which doesn't make sense since there can be mismatch in number of ratings and number of distinct years which will eventually cause an error. +- Here is an article from Scaler topics to read more: https://www.scaler.com/topics/distinct-in-sql/ + + +> ![Screenshot 2024-02-08 at 10.15.03 AM](https://hackmd.io/_uploads/HJXn3Abs6.png) + + + +--- +**Pseudo Code:** +--- + +Let's talk about how this works. A lot of SQL queries can be easily understood by relating them to basic for loops, etc. Throughout this module, we will try to demonstrate the understanding of complex queries by providing corresponding pseudo code, as I attempt to do the same in a programming language. As all of you have already solved many DSA problems, this shall be much more easy and fun for you to learn. + +So, let's try to understand the above query with a pseudo code. The pseudo code for the above query would be as follows: + +```python +answer = [] + +for each row in film: + answer.append(row) + +filtered_answer = [] + +for each row in answer: + filtered_answer.append(row['rating'], row['release_year']) + +unique_answer = set(filtered_answer) + +return unique_answer +``` + +So, what you see is that DISTINCT keyword on multiple column gives you for all of the rows in the table, the distinct value of pair of these columns. + +--- +Select statement to print a constant value +--- + +In one of the above queries we have already seen that we can print constant values as well using `Select` command. Now let's see it's further uses. + +Let's say we want to print a constant value in the output. Eg: The first program that almost every programmer writes: "Hello World". How do we do that? We can use the `SELECT` statement to print a constant value. Example: + +```sql +SELECT 'Hello World'; +``` + +That's it. No from, nothing. Just the value. You can also combine it with other columns. Example: + +```sql +SELECT title, 'Hello World' FROM film; +``` + +--- +Operations on Columns +--- + +Let's say we want to select the `title` and `length` columns from the `film` table. If you see, the value of length is currently in minutes, but we want to select the length in hours instead of minutes. How do we do that? We can use the `SELECT` statement to perform operations on columns. Example: + +```sql +SELECT title, length/60 FROM film; +``` + +Later in the course we will learn about Built-In functions in SQL as well. You can use those functions as well to perform operations on columns. Example: + +```sql +SELECT title, ROUND(length/60) FROM film; +``` + +ROUND function is used to round off a number to the nearest integer. So the above query will give you the title of the film, and the length of the film in hours, rounded off to the nearest integer. + +--- +Inserting Data from Another Table +--- + + +By the way, SELECT can also be used to insert data in a table. Let's say we want to insert all the films from the `film` table into the `film_copy` table. We can combine the `SELECT` and `INSERT INTO` statements to do that. Example: + +```sql +INSERT INTO film_copy (title, description, release_year, language_id, rental_duration, rental_rate, length, replacement_cost, rating, special_features) +SELECT title, description, release_year, language_id, rental_duration, rental_rate, length, replacement_cost, rating, special_features +FROM film; +``` + +Here we are using the `SELECT` statement to select all the columns from the `film` table, and then using the `INSERT INTO` statement to insert the selected data into the `film_copy` table. Note that the column names in the `INSERT INTO` clause and the `SELECT` clause are the same, and the values are inserted in the same order as the columns are specified in the `INSERT INTO` clause. So, the first value in the `SELECT` clause will be inserted in the first column specified in the `INSERT INTO` clause, and so on. + + + +Okay, let us verify how well you have learnt till now with a few quiz questions. + +--- +Quiz 3 +--- + +What does the DISTINCT keyword do in a SELECT statement? + +### Choices + +- [ ] It counts the number of unique records in a column. +- [ ] It finds the sum of all records in a column. +- [ ] It eliminates duplicate records in the output. +- [ ] It sorts the records in ascending order. + +--- +Quiz 4 +--- + +If you want to retrieve all records from a 'customers' table, which statement would you use? + +### Choices + +- [ ] SELECT * FROM customers; +- [ ] SELECT ALL FROM customers; +- [ ] RETRIEVE * FROM customers; +- [ ] GET * FROM customers; + +--- +Quiz 5 +--- + +What is the result of the following SQL query: `SELECT DISTINCT column1 FROM table1;`? + +### Choices + +- [ ] It displays all values of column1, including duplicates. +- [ ] It displays unique non-null values of column1. +- [ ] It counts the total number of unique values in column1. +- [ ] It sorts all values in column1. + +--- +WHERE clause +--- + +Till now, we have been doing basic read operations. SELECT query with only FROM clause is rarely sufficient. Rarely do we want to return all rows. Often we need to have some kind of filtering logic etc. for the rows that should be returned. Let's learn how to do that. + +Let's use Sakila database to understand this. Say we want to select all the films from the `film` table which have a rating of `PG-13`. How do we do that? We can use the `WHERE` clause to filter rows based on a condition. Example: + +```sql +SELECT * FROM film WHERE rating = 'PG-13'; +``` + +Here we are using the `WHERE` clause to filter rows based on the condition that the value of the `rating` column should be `PG-13`. Note that the `WHERE` clause is always used after the `FROM` clause. In terms of pseudocode, you can think of where clause to work as follows: + +```python +answer = [] + +for each row in film: + if row.matches(conditions in where clause) # new line from above + answer.append(row) + +filtered_answer = [] + +for each row in answer: + filtered_answer.append(row['rating'], row['release_year']) + +unique_answer = set(filtered_answer) # assuming we also had DISTINCT + +return unique_answer +``` + +If you see, where clause can be considered analgous to `if` in a programming language. With if also, there are many other operators that are used, right? Can you name which operators do we often use in programming languages with `if`? + + + +--- +AND, OR, NOT +--- + + +We use things like `and` , `or`, `!` in programming languages to combine multiple conditions. Similarly, we can use `AND`, `OR`, `NOT` operators in SQL as well. Example: We want to get all the films from the `film` table which have a rating of `PG-13` and a release year of `2006`. We can use the `AND` operator to combine multiple conditions. + +```sql +SELECT * FROM film WHERE rating = 'PG-13' AND release_year = 2006; +``` + +Similarly, we can use the `OR` operator to combine multiple conditions. Example: We want to get all the films from the `film` table which have a rating of `PG-13` or a release year of `2006`. We can use the `OR` operator to combine multiple conditions. + +```sql +SELECT * FROM film WHERE rating = 'PG-13' OR release_year = 2006; +``` + +Similarly, we can use the `NOT` operator to negate a condition. Example: We want to get all the films from the `film` table which do not have a rating of `PG-13`. We can use the `NOT` operator to negate the condition. + +```sql +SELECT * FROM film WHERE NOT rating = 'PG-13'; +``` + +An advice on using these operators: If you are using multiple operators, it is always a good idea to use parentheses to make your query more readable. Else, it can be difficult to understand the order in which the operators will be evaluated. Example: + +```sql +SELECT * FROM film WHERE rating = 'PG-13' OR release_year = 2006 AND rental_rate = 0.99; +``` + +Here, it is not clear whether the `AND` operator will be evaluated first or the `OR` operator. To make it clear, we can use parentheses. Example: + +```sql +SELECT * FROM film WHERE rating = 'PG-13' OR (release_year = 2006 AND rental_rate = 0.99); +``` + +Till now, we have used only `=` for doing comparisons. Like traditional programming languages, MySQL also supports other comparison operators like `>`, `<`, `>=`, `<=`, `!=` etc. Just one special case, `!=` can also be written as `<>` in MySQL. Example: + +```sql +SELECT * FROM film WHERE rating <> 'PG-13'; +``` + +--- +IN Operator +--- + + +With comparison operators, we can only compare a column with a single value. What if we want to compare a column with multiple values? For example, we want to get all the films from the `film` table which have a rating of `PG-13` or `R`. One way to do that can be to combine multiple consitions using `OR`. A better way will be to use the `IN` operator to compare a column with multiple values. Example: + +```sql +SELECT * FROM film WHERE rating IN ('PG-13', 'R'); +``` + +Okay, now let's say we want to get those films that have ratings anything other than the above 2. Any guesses how we may do that? + + +Correct! We had earlier discussed about `NOT`. You can also use `NOT` before `IN` to negate the condition. Example: + +```sql +SELECT * FROM film WHERE rating NOT IN ('PG-13', 'R'); +``` + +Think of IN to be like any other operator, additionally, it allows comparison with multiple values. + + +--- +ORDER BY Clause +--- + + + +Now let's discuss another important clause. ORDER BY clause allows to return values in a sorted order. Example: + +```sql +SELECT * FROM film ORDER BY title; +``` + +The above query will return all the rows from the `film` table in ascending order of the `title` column. If you want to return the rows in descending order, you can use the `DESC` keyword. Example: + +```sql +SELECT * FROM film ORDER BY title DESC; +``` + +You can also sort by multiple columns. Example: + +```sql +SELECT * FROM film ORDER BY title, release_year; +``` + +The above query will return all the rows from the `film` table in ascending order of the `title` column and then in ascending order of the `release_year` column. Consider the second column as tie breaker. If 2 rows have same value of title, release year will be used to break tie between them. Example: + +```sql +SELECT * FROM film ORDER BY title DESC, release_year DESC; +``` + +Above query will return all the rows from the `film` table in descending order of the `title` column and if tie on `title`, in descending order of the `release_year` column. + +By the way, you can ORDER BY on a column which is not present in the SELECT clause. Example: + +```sql +SELECT title FROM film ORDER BY release_year; +``` + +Let's also build the analogy of this with a pseudocode. + +```python +answer = [] + +for each row in film: + if row.matches(conditions in where clause) # new line from above + answer.append(row) + +answer.sort(column_names in order by clause) + +filtered_answer = [] + +for each row in answer: + filtered_answer.append(row['rating'], row['release_year']) + +return filtered_answer +``` + +If you see, the `ORDER BY` clause is applied after the `WHERE` clause. So, first the rows are filtered based on the `WHERE` clause and then they are sorted based on the `ORDER BY` clause. And only after that are the columns that have to be printed taken out. And that's why you can sort based on columns not even in the `SELECT` clause. + +> We will discuss about order by once again in CRUD 2 notes. + + + +--- +Solution to Quizzes: +--- + +> -- +Quiz1: Option A (INSERT INTO table_name VALUES (value1, value2, value3,…);) +Quiz2: Option B (INSERT INTO table1 (column1) VALUES (value1);) +Quiz3: Option C (It eliminates duplicate records in the output.) +Quiz4: Option A (SELECT * FROM customers;) +Quiz5: Option B (It displays unique non-null values of column1.) +-- \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/04 Notes CRUD - 2.md b/Non-DSA Notes/SQL Notes/04 Notes CRUD - 2.md new file mode 100644 index 0000000..f014553 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/04 Notes CRUD - 2.md @@ -0,0 +1,503 @@ + +--- +Agenda +--- + + +- CRUD + - Read + - BETWEEN Operator + - LIKE Operator + - IS NULL Operator + - ORDER BY Clause revisited + - LIMIT Clause + - Update + - Delete + - Delete vs Truncate vs Drop + - Truncate + - Drop + +--- +BETWEEN Operator +--- + + +Now, we are going to start the discussion about another important keyword in SQL, `BETWEEN`. + +Let's say we want to get all the films from the `film` table which have a release year >= `2005` and <= `2010`. We can do this by ORing 2 conditions. We can also use the `BETWEEN` operator to do that. Example: + +```sql +SELECT * FROM film WHERE release_year BETWEEN 2005 AND 2010; +``` + +BETWEEN operator is inclusive of the values specified. So, the above query will return all the films which have a release year >= `2005` and <= `2010`. So that is something to be mindful of. + +Between Operator also works for strings. Let's assume that there is a country table with a "name" column of type varchar. If we execute this query: + +```sql +Select * from country where name between 'a' and 'b'; +``` + +We will get this result: + +```sql +Argentina +. +. +. +Argelia. + +-- The above query will give us all country names starting with A/a till B/b. +-- The above query willl limit answers till letter b only. Ex: 'Bolivia' will not be included since it have more letters than just b. +-- Therefore above query gives all countries between a till b. Regardless of case sensitivity. +``` +Between works with other data-types as well such as dates. Let's say there is an orders table and we want all orders between dates '2023-07-01' AND '2024-01-01'. + +```sql +SELECT * FROM Orders +WHERE OrderDate BETWEEN '2023-07-01' AND '2024-01-01'; +``` + +> Try this above query with your own variations. + +--- +LIKE Operator +--- + + +LIKE operator is one of the most important and frequently used operator in SQL. Whenever there is a column storing strings, there comes a requirement to do some kind of pattern matching. Example, assume Scaler's database where we have a `batches` table with a column called `name`. Let's say we want to get the list of `Academy` batches and the rule is that an Academy batch shall have `Academy` somewhere within the name it can be at starting, at end or anywhere in the name of batch. How do we find those? We can use the `LIKE` operator for this purpose. + +Let's talk about how the `LIKE` operator works. The `LIKE` operator works with the help of 2 wildcards in our queries, `%` and `_`. The `%` wildcard matches any number of characters (>= 0 occurrences of any set of characters). The `_` wildcard matches exactly one character (any character). Example: + +1. LIKE 'cat%' will match "cat", "caterpillar", "category", etc. but not "wildcat" or "dog". +2. LIKE '%cat' will match "cat", "wildcat", "domesticcat", etc. but not "cattle" or "dog". +3. LIKE '%cat%' will match "cat", "wildcat", "cattle", "domesticcat", "caterpillar", "category", etc. but not "dog" or "bat". +4. LIKE '_at' will match "cat", "bat", "hat", etc. but not "wildcat" or "domesticcat". +5. LIKE 'c_t' will match "cat", "cot", "cut", etc. but not "chat" or "domesticcat". +6. LIKE 'c%t' will match "cat", "chart", "connect", "cult", etc. but not "wildcat", "domesticcat", "caterpillar", "category". + + + + **Example:** +```sql +SELECT * FROM batches WHERE name LIKE '%Academy%'; +``` + +Similarly, let's say in our Sakila database, we want to get all the films which have `LOVE` in their title. We can use the `LIKE` operator. Example: + +```sql +SELECT * FROM film WHERE title LIKE '%LOVE%'; + +-- These pattern strings are case insensitive as well. +-- Hence below query will give same results as above. + +SELECT * FROM film WHERE title LIKE '%LovE%'; +``` + +> Conclusion + +Some of the key points to remember are: + +- A significant tool for pattern-based data searching is the LIKE operator in MySQL. +The underscore (_) wildcard character is used to match a single character, whereas the percentage (%) wildcard character is used to match any number of characters (zero or more) in a string. +- To verify if you have understood the LIKE operator, let us have few quizzes. +- These pattern strings will be considered as case insensitive as well. +- Extra Resource for Like operator: https://www.scaler.com/topics/like-in-mysql/ + + +--- +Quiz 1 +--- + + +If you want to find all customers from a 'Customers' table whose names end with 'son', which SQL query would you use? + +### Choices + +- [ ] SELECT * FROM Customers WHERE Name LIKE 'son%' +- [ ] SELECT * FROM Customers WHERE Name LIKE '%son' +- [ ] SELECT * FROM Customers WHERE Name LIKE 'son' +- [ ] SELECT * FROM Customers WHERE Name LIKE 'son' + + +--- +Quiz 2 +--- + + +In a 'Books' table, you want to select all books whose titles contain the word 'moon'. Which of the following queries should you use? + +### Choices + +- [ ] SELECT * FROM Books WHERE Title LIKE 'moon%' +- [ ] SELECT * FROM Books WHERE Title LIKE '%moon' +- [ ] SELECT * FROM Books WHERE Title LIKE '%moon%' +- [ ] SELECT * FROM Books WHERE Title LIKE 'moon_' + +--- +Quiz 3 +--- + +Suppose you have an 'Orders' table and you want to find all orders whose 'OrderNumber' has '123' at the exact middle. Assume 'OrderNumber' is a five-character string. What query should you use? + +### Choices + +- [ ] SELECT * FROM Orders WHERE OrderNumber LIKE '%123%' +- [ ] SELECT * FROM Orders WHERE OrderNumber LIKE '123%' +- [ ] SELECT * FROM Orders WHERE OrderNumber LIKE '\_123_' +- [ ] SELECT * FROM Orders WHERE OrderNumber LIKE '%123' + +--- +IS NULL Operator +--- + + +Now, we are almost at the end of the discussion about different operators. Do you all remember how we store empties, that is, no value for a particular column for a particular row? We store it as `NULL`/`None +`. Interestingly working with NULLs is a bit tricky. We cannot use the `=` operator to compare a column with `NULL`. + +> An Empty box and empty brain aren't same things. Similarly an empty number and an empty string are considered different objects. + +--- + + +![Screenshot 2024-02-07 at 12.22.01 PM](https://hackmd.io/_uploads/HJB-FsgjT.jpg) +> Pic credits: anonymous + +--- + +**Example:** + +```sql +SELECT * FROM film WHERE description = NULL; +``` + +The above query will not return any rows. Why? Because `NULL` is not equal to `NULL`. Infact, `NULL` is not equal to anything. Nor is it not equal to anything. It is just `NULL`. + +Example: + +```sql +SELECT NULL = NULL; +``` + +The above query will return `NULL`. Similarly, `3 = NULL` , `3 <> NULL` , `NULL <> NULL` will also return `NULL`. So, how do we compare a column with `NULL`? We use the `IS NULL` operator. Example: + +```sql +SELECT * FROM film WHERE description IS NULL; +``` + +Similarly, we can use the `IS NOT NULL` operator to find all the rows where a particular column is not `NULL`. Example: + +```sql +SELECT * FROM film WHERE description IS NOT NULL; +``` + +In many assignments, you will find that you will have to use the `IS NULL` and `IS NOT NULL` operators. Without them you will miss out on rows that had NULL values in them and get the wrong answer. Example: +Find customers with id other than 2. If you use `=` operator, you will miss out on the customer with id `NULL`. + +```sql +SELECT * FROM customers WHERE id != 2; +``` + +The above query will not return the customer with id `NULL`. So, you will get the wrong answer. Instead, you should use the `IS NOT NULL` operator. Example: + +```sql +SELECT * FROM customers WHERE id IS NOT NULL AND id != 2; +``` +--- +ORDER BY clause continued: +--- + + +Now let's discuss another important clause. ORDER BY clause allows to return values in a sorted order. Example: + +```sql +SELECT * FROM film ORDER BY title; +``` + +The above query will return all the rows from the `film` table in ascending order of the `title` column. If you want to return the rows in descending order, you can use the `DESC` keyword. Example: + +```sql +SELECT * FROM film ORDER BY title DESC; +``` + +You can also sort by multiple columns. Example: + +```sql +SELECT * FROM film ORDER BY title, release_year; +``` + +The above query will return all the rows from the `film` table in ascending order of the `title` column and then in ascending order of the `release_year` column. Consider the second column as tie breaker. If 2 rows have same value of title, release year will be used to break tie between them. Example: + +```sql +SELECT * FROM film ORDER BY title DESC, release_year DESC; +``` + +Above query will return all the rows from the `film` table in descending order of the `title` column and if tie on `title`, in descending order of the `release_year` column. + +By the way, you can ORDER BY on a column which is not present in the SELECT clause. Example: + +```sql +SELECT title FROM film ORDER BY release_year; +``` + +Let's also build the analogy of this with a pseudocode. + +```python +answer = [] + +for each row in film: + if row.matches(conditions in where clause) # new line from above + answer.append(row) + +answer.sort(column_names in order by clause) + +filtered_answer = [] + +for each row in answer: + filtered_answer.append(row['rating'], row['release_year']) + +return filtered_answer +``` + +If you see, the `ORDER BY` clause is applied after the `WHERE` clause. So, first the rows are filtered based on the `WHERE` clause and then they are sorted based on the `ORDER BY` clause. And only after that are the columns that have to be printed taken out. And that's why you can sort based on columns not even in the `SELECT` clause. + +--- +ORDER BY Clause with DISTINCT keyword +--- + +When employing the DISTINCT keyword in an SQL query, the ORDER BY clause is limited to sorting by columns explicitly specified in the SELECT clause. This restriction stems from the nature of DISTINCT, which is designed to eliminate duplicate records based on the selected columns. + +Consider the scenario where you attempt to order the results by a column not included in the SELECT clause, as demonstrated in this example: + +```sql +SELECT DISTINCT title FROM film ORDER BY release_year; +``` + +The SQL engine would generate an error in this case. The reason behind this restriction lies in the potential ambiguity introduced when sorting by a column not present in the SELECT clause. + +When you use DISTINCT, the database engine identifies unique values in the specified columns and returns a distinct set of records. However, when you attempt to order these distinct records by a column that wasn't part of the selection, ambiguity arises. + +Take the example query: + +```sql +SELECT DISTINCT title FROM film ORDER BY release_year; +``` + +Here, the result set will include distinct titles from the film table, but the sorting order is unclear. Multiple films may share the same title but have different release years. Without explicitly stating which release year to consider for sorting, the database engine encounters ambiguity. + +By limiting the ORDER BY clause to columns present in the SELECT clause, you provide a clear directive on how the results should be sorted. In the corrected query: + +```sql +SELECT DISTINCT title FROM film ORDER BY title; +``` + +You instruct the database engine to sort the distinct titles alphabetically by the title column, avoiding any confusion or ambiguity in the sorting process. This ensures that the results are not only distinct but also ordered in a meaningful and unambiguous manner. + +--- +LIMIT Clause +--- + +LIMIT clause allows us to limit the number of rows returned by a query. Example: + +```sql +SELECT * FROM film LIMIT 10; +``` + +The above query will return only 10 rows from the `film` table. If you want to return 10 rows starting from the 11th row, you can use the `OFFSET` keyword. Example: + +```sql +SELECT * FROM film LIMIT 10 OFFSET 10; +``` + +The above query will return 10 rows starting from the 11th row from the `film` table. You can also use the `OFFSET` keyword without the `LIMIT` keyword. Example: + +```sql +SELECT * FROM film OFFSET 10; +``` + +The above query will return all the rows starting from the 11th row from the `film` table. + +LIMIT clause is applied at the end. Just before printing the results. Taking the example of pseudocode, it works as follows: + +```python +answer = [] + +for each row in film: + if row.matches(conditions in where clause) # new line from above + answer.append(row) + +answer.sort(column_names in order by clause) + +filtered_answer = [] + +for each row in answer: + filtered_answer.append(row['rating'], row['release_year']) + +return filtered_answer[start_of_limit: end_of_limit] +``` + +Thus, if your query contains ORDER BY clause, then LIMIT clause will be applied after the ORDER BY clause. Example: + +```sql +SELECT * FROM film ORDER BY title LIMIT 10; +``` + +The above query will return 10 rows from the `film` table in ascending order of the `title` column. + +--- +Update +--- + + +Now let's move to learn U of CRUD. Update and Delete are thankfully much simple, so don't worry, we will be able to breeze through it over the coming 20 mins. As the name suggests, this is used to update rows in a table. The general syntax is as follows: + +```sql +UPDATE table_name SET column_name = value WHERE conditions; +``` + +Example: + +```sql +UPDATE film SET release_year = 2006 WHERE id = 1; +``` + +The above query will update the `release_year` column of the row with `id` 1 in the `film` table to 2006. You can also update multiple columns at once. Example: + +```sql +UPDATE film SET release_year = 2006, rating = 'PG' WHERE id = 1; +``` + +Let's talk about how update works. It works as follows: + +```python +for each row in film: + if row.matches(conditions in where clause) + row['release_year'] = 2006 + row['rating'] = 'PG' +``` + +So basically update query iterates through all the rows in the table and updates the rows that match the conditions in the where clause. So, if you have a table with 1000 rows and you run an update query without a where clause, then all the 1000 rows will be updated. Example: + +```sql +UPDATE film SET release_year = 2006; +-- By default MySQL works with Safe_Mode 'ON' which prevents us from doing this kind of operations. +``` + +The above query will result in all the rows of table having release_year as 2006, which is not desired. So, be careful while running update queries. + +--- +Delete +--- + + +Finally, we are at the end of CRUD. Let's talk about Delete operations. The general syntax is as follows: + +```sql +DELETE FROM table_name WHERE conditions; +``` + +Example: + +```sql +DELETE FROM film WHERE id = 1; +``` + +The above query will delete the row with `id` 1 from the `film` table. + +Beware, If you don't specify a where clause, then all the rows from the table will be deleted. Example: + +```sql +DELETE FROM film; +-- By default MySQL works with Safe_Mode 'ON' which prevents us from doing this kind of operations. +``` + +Let's talk about how delete works as well in terms of code. + +```python +for each row in film: + if row.matches(conditions in where clause) + delete row +``` + + + +--- +Delete vs Truncate vs Drop +--- + + +There are two more commands which are used to delete rows from a table. They are `TRUNCATE` and `DROP`. Let's discuss them one by one. + +#### Truncate + +The command looks as follows: + +```sql +TRUNCATE film; +``` + +The above query will delete all the rows from the `film` table. TRUNCATE command internally works by removing the complete table and then recreating it. So, it is much faster than DELETE. But it has a disadvantage. It cannot be rolled back meaning you can't get back your data. We will learn more about rollbacks in the class on Transactions. But at a high level, this is because as the complete table is deleted as an intermediate step, no log is maintained as to what all rows were deleted, and thus is not easy to revert. So, if you run a TRUNCATE query, then you cannot undo it. + +>Note: It also resets the primary key ID. For example, if the highest ID in the table before truncating was 10, then the next row inserted after truncating will have an ID of 1. + +#### Drop + +The command looks as follows: + +Example: + +```sql +DROP TABLE film; +``` + +The above query will delete the `film` table. The difference between `DELETE` and `DROP` is that `DELETE` is used to delete rows from a table and `DROP` is used to delete the entire table. So, if you run a `DROP` query, then the entire table will be deleted. All the rows and the table structure will be deleted. So, be careful while running a `DROP` query. Nothing will be left of the table after running a `DROP` query. You will have to recreate the table from scratch. + +Note that, +DELETE: +1. Removes specified rows one-by-one from table based on a condition(may delete all rows if no condition is present in query but keeps table structure intact). +2. It is slower than TRUNCATE since we delete values one by one for each rows. +3. Doesn't reset the key. It means if there is an auto_increment key such as student_id in students table and `last student_id value is 1005` and we deleted this entry using query: +```sql= +DELETE FROM students WHERE student_id = 1005; +``` +- Now, if we insert one more entry/row in students table then student_id for this column will be 1006. Hence continuing with same sequence without resseting the value. + +5. It can be rolled back. Means if we have deleted a value then we can get it back again. + +TRUNCATE: +1. Removes the complete table and then recreats it with same schema (columns). +2. Faster than DELETE. Since Truncate doesn't delete values one by one rather it deletes the whole table at once by de-referencing it and then creates another table with same schema hecne Truncate is faster. +3. Resets the key. It means if there is an auto_increment key such as student_id in students table and `last student_id value is 1005` and we Truncated this whole table then in new table the fresh entry/row will start with student_id = 1. +4. It can not be rolled back because the complete table is deleted as an intermediate step meaning we can't get the same table back. + +DROP: +1. Removes complete table and the table structre as well. +2. It can not be rolled back meaning that we can't get back our table or database. + +--- +> **Diagram for reference:** +--- + +![IMG_70F0B6582911-1](https://hackmd.io/_uploads/BJEqJngiT.jpg) + + + + +--- +Extra Reading materials +--- + +Learn more about Delte/Truncate/Drop using our Scaler Topic's article: https://www.scaler.com/topics/difference-between-delete-drop-and-truncate/ + +SQL functions: https://docs.google.com/document/d/1IFGuCvFv8CIcq_4FTIBusuARa81Oak4_snK1qJF8C54/edit#heading=h.gjdgxs + + +--- +Solution to Quizzes: +--- + +> -- +Quiz1: Option B (SELECT * FROM Customers WHERE Name LIKE '%son') +Quiz2: Option C (SELECT * FROM Books WHERE Title LIKE '%moon%') +Quiz3: Option C (SELECT * FROM Orders WHERE OrderNumber LIKE '\_123_') +-- \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/05 Notes Joins - 1.md b/Non-DSA Notes/SQL Notes/05 Notes Joins - 1.md new file mode 100644 index 0000000..3dce83a --- /dev/null +++ b/Non-DSA Notes/SQL Notes/05 Notes Joins - 1.md @@ -0,0 +1,380 @@ + +--- +Agenda +--- + + + +- Joins +- Self Join + - SQL query as pseudocode +- Joining Multiple Tables + +--- +Joins +--- + + + +Today we are going to up the complexity of SQL Read queries we are going to write while still using the same foundational concepts we had learnt in the previous class on CRUD. Till now, whenever we had written an SQL query, the query found data from how many tables? + + +Correct, every SQL query we had written till now was only finding data from 1 table. Most of the queries we had written in the previous class were on the `film` table where we applied multiple filters etc. But do you think being able to query data from a single table is enough? Let's take a scenario of Scaler. Let's say we have 2 tables as follows in the Scaler's database: + +`batches` + +| batch_id | batch_name | +|----------|------------| +| 1 | Batch A | +| 2 | Batch B | +| 3 | Batch C | + +`students` + +| student_id | first_name | last_name | batch_id | +|------------|------------|-----------|----------| +| 1 | John | Doe | 1 | +| 2 | Jane | Doe | 1 | +| 3 | Jim | Brown | 2 | +| 4 | Jenny | Smith | 3 | +| 5 | Jack | Johnson | 2 | + +Suppose, someone asks you to print the name of every student, along with the name of their batch. The output should be something like: + +| student_name | batch_name | +|--------------|------------| +| John | Batch A | +| Jane | Batch A | +| Jim | Batch B | +| Jenny | Batch C | +| Jack | Batch B | + + + +Will you be able to get all of this data by querying over a single table? No. The `student_name` is there in the students table, while the `batch_name` is in the batches table! We somehow need a way to combine the data from both the tables. This is where joins come in. What does the word `join` mean to you? + + + +Correct! Joins, as the name suggests, are a way to combine data from multiple tables. For example, if we want to combine the data from the `students` and `batches` table, we can use joins for that. Think of joins as a way to stitch rows of 2 tables together, based on the condition you specify. Example: In our case, we would want to stitch a row of students table with a row of batches table based on what? Imagine that every row of `students` we try to match with every row of `batches`. Based on what condition to be true between those will we stitch them? + + + + +Correct, we would want to stitch a row of students table with a row of batches table based on the `batch_id` column. This is what we call a `join condition`. A join condition is a condition that must be true between the rows of 2 tables for them to be stitched together. + +Let's try to understand this with a Venn diagram: + + +> Venn Diagram: + +![Inner_joins_Venn](https://hackmd.io/_uploads/BJAxzIXsT.png) +> Source: Unknown + + + +Let's see how we can write a join query for our example. +```sql +SELECT students.first_name, batches.batch_name +FROM students +JOIN batches +ON students.batch_id = batches.batch_id; +``` + +Let's break down this query. The first line is the same as what we have been writing till now. We are selecting the `first_name` column from the `students` table and the `batch_name` column from the `batches` table. The next line is where the magic happens. We are using the `JOIN` keyword to tell SQL that we want to join the `students` table with the `batches` table. The next line is the join condition. We are saying that we want to join the rows of `students` table with the rows of `batches` table where the `batch_id` column of `students` table is equal to the `batch_id` column of `batches` table. This is how we write a join query. + + + +Let's take an example of this on the Sakila database. Let's say for every film, we want to print its name and the language. How can we do that? + +```sql +SELECT film.title, language.name +FROM film +JOIN language +ON film.language_id = language.language_id; +``` + +Now, sometimes typing name of tables in the query can become difficult. For example, in the above query, we have to type `film` and `language` multiple times. To make this easier, we can give aliases to the tables. For example, we can give the alias `f` to the `film` table and `l` to the `language` table. We can then use these aliases in our query. Let's see how we can do that: + +```sql +SELECT f.title, l.name +FROM film f +JOIN language l +ON f.language_id = l.language_id; + +-- These aliases are even more helpful in self joins +``` +**This above join is also known as Inner Join. We will talk more about Inner and Outer joins in next topic's notes.** + +**If you want to know more about this topic you may visit:** https://scaler.com/topics/inner-join-in-sql/ + + +--- +Visual Description using one more table example: +--- + +We will use example of “Students” table and a “Batch” table again. + +> Students Table: + +> ![Screenshot 2024-02-16 at 1.28.59 PM](https://hackmd.io/_uploads/rJJIIqnsp.png) + + +> Batches Table: +> ![Screenshot 2024-02-16 at 1.38.29 PM](https://hackmd.io/_uploads/ryDG_9hja.png) + + + +Lets use the SQL query again: + +```sql +SELECT students.first_name, batches.batch_name +FROM students +JOIN batches +ON students.batch_id = batches.batch_id; +``` + +Here for this query each value in **Student's batch_id** column is matched with each value in **Batches's batch_id** column as described in following pseudo code. + +In pseudocode, it shall look like: + +```python3 +ans = [] + +for row1 in students: + for row2 in batches: + if row1.batch_id == row2.batch_id: + ans.add(row1 + row2) + +for row in ans: + print(row.name, row.name) +``` + +Now, the final table will look like following one where light blue column belongs to Student's table and magenta color columns belong to Batches table in this resultant table: + +> Resultant Table: + +> ![Screenshot 2024-02-16 at 2.01.22 PM](https://hackmd.io/_uploads/rkHdaqhiT.png) + + +Now from this table we can print any columns using the table name aliases. +For example if we want to print student's name and batches name then we may write following inside select command: +```sql +SELECT students.first_name, batches.batch_name +``` + +> Activity: Try `Select *` for the above query. + + + +--- +Self Join +--- + + +Let's say at Scaler, for every student we assign a Buddy. For this we have a `students` table, which has following columns/fields: + +`id | name | buddy_id` + +This `buddy_id` will be an id of what? + +> NOTE: Give hints to get someone to say `student` + +Correct. Now, let's say we have to print for every student, their name and their buddy's name. How will we do that? Here 2 rows of which tables would we want to stitch together to get this data? + +Correct, an SQL query for the same shall look like: + +```sql +SELECT s1.name, s2.name +FROM students s1 +JOIN students s2 +ON s1.buddy_id = s2.id; +``` + +This is an example of SELF join. A self join is a join where we are joining a table with itself. In the above query, we are joining the `students` table with itself. In a self joining, aliasing tables is very important. If we don't alias the tables, then SQL will not know which row of the table to match with which row of the same table (because both of them have same names as they are the same table only). Please refer to following picture. + +> Note: Do remember that in self join too the matching row for given conditions will be present in the output/resultant table. + +--- +> Venn Diagram: + +![sql_self_join](https://hackmd.io/_uploads/BJFKVUXo6.png) + +> Source: Unknown +--- + +Please try this above query once by yourself. + + + +Consider following infographics to understand above query: +--- + +In this table, each student is assigned a 'Buddy', now we have to find buddies of every student. + +> ![Screenshot 2024-02-16 at 2.03.11 PM](https://hackmd.io/_uploads/ByS1C92o6.png) + + + +To find each student’s buddy, we used a self-join to stitch together two rows of our table. Let's see how this works in practice. + +```sql +SELECT s1.name, s2.name +FROM students t1 +JOIN students t2 +ON s1.buddy_id = s2.id; +``` + +After combining above table we will get following output: + +> ![Screenshot 2024-02-16 at 4.38.31 PM](https://hackmd.io/_uploads/Hk3HfThiT.png) + + +Now that we have final table let's print t1.name and t2.name i.e name of student and their buddy i.e final answer: + +> ![Screenshot 2024-02-16 at 4.39.27 PM](https://hackmd.io/_uploads/SyJFza2ja.png) + + + + +--- +SQL query as pseudocode (Self Join) +--- + + +As we have been doing since CRUD queries, let's also see how Joins can be represented in terms of pseudocode. + +Let's take this query: + +```sql +SELECT s1.name, s2.name +FROM students s1 +JOIN students s2 +ON s1.buddy_id = s2.id; +``` + +In pseudocode, it shall look like: + +```python3 +ans = [] + +for row1 in students: + for row2 in students: + if row1.buddy_id == row2.id: + ans.add(row1 + row2) + +for row in ans: + print(row.name, row.name) +``` + +**Additional resources for self joins:** https://www.scaler.com/topics/sql/self-join-in-sql/ + +--- +Joining Multiple Tables +--- + + +Till now, we had only joined 2 tables. But what if we want to join more than 2 tables? Let's say we want to print the name of every film, along with the name of the language and the name of the original language. How can we do that? If you have to add 3 numbers, how do you do that? +Correct! we add 2 numbers then add 3rd number to their sum. + +To get the name of the language, we would first want to combine `film` and `language` table over the `language_id` column which will also return a table (Let's say an intermediatory table for now). Then, we would want to combine this resultant table with the language table again over the `original_language_id` column. This is how we can do that: + +--- +> ![joining_multiple_tables](https://hackmd.io/_uploads/rJRbZL7jT.png) +> Source: Unknown + +--- + + +```sql +SELECT f.title, l1.name, l2.name +FROM film f +JOIN language l1 +ON f.language_id = l1.language_id +JOIN language l2 +ON f.original_language_id = l2.language_id; +``` + +Let's see how this might work in terms of pseudocode: + +```python3 +ans = [] + +for row1 in film: + for row2 in language: + if row1.language_id == row2.id: + ans.add(row1 + row2) + +for row in ans: + for row3 in language: + if row.language_id == row3.language_id: + ans.add(row + row3) + +for row in ans: + print(row.name, row.language_name, row.original_language_name) +``` + +> Activity: Please try the above query once by yourself. + + +Let's see how does the above query looks in execution: + +`Film` +> ![Screenshot 2024-02-16 at 4.43.44 PM](https://hackmd.io/_uploads/BkBFma2sp.png) + + +`Language` +> ![Screenshot 2024-02-16 at 4.45.41 PM](https://hackmd.io/_uploads/SkVg4p2sp.png) + + + +Expected output: Name of every film, along with the name of the language and the name of the original language. + +`Output` +> ![Screenshot 2024-02-16 at 4.48.26 PM](https://hackmd.io/_uploads/Hyq9ETnj6.png) + + +To get the name of the language, we would first want to combine film and language table over the language_id column: + +> ![Screenshot 2024-02-16 at 4.50.56 PM](https://hackmd.io/_uploads/r1JVr62oa.png) + + +Then, we would want to combine the result of that with the language table again over the original_language_id column. + +> ![Screenshot 2024-02-16 at 4.53.38 PM](https://hackmd.io/_uploads/H17ASa2oT.png) + + +Now we can easily print the highlighted tables as output: +`Final Output:` +> ![Screenshot 2024-02-16 at 4.54.38 PM](https://hackmd.io/_uploads/Hyo-UTns6.png) + + + +--- +Order of execution: +--- + +**Order of Execution** of a SQL query: +- **FROM** - The database gets the data from tables in FROM . +- **JOIN** - Depending on the type of JOIN used in the query and conditions specified for joining the tables in the ON clause, the database engine matches rows from the virtual table created in the FROM clause. +- **WHERE** - After the JOIN operation, the data is filtered based on the conditions specified in the WHERE clause. Rows that do not meet the criteria are excluded. +- **GROUP BY** - If the query includes a GROUP BY clause, the rows are grouped based on the specified columns and aggregate functions are applied to the groups created. +- **HAVING** - The HAVING clause filters the groups of rows based on the specified conditions +- **SELECT** - After grouping and filtering is done, the SELECT statement determines which columns to include in the final result set. +- **ORDER BY** - It allows you to sort the result set based on one or more columns, either in ascending or descending order. +- **OFFSET** - The specified number of rows are skipped from the beginning of the result set. +- **LIMIT** - After skipping the rows, the LIMIT clause is applied to restrict the number of rows returned. + +> **Note: The type of joins discussed here are also known as Inner Joins.** + + + +--- +Conclusion: +--- +- Inner join in SQL selects all the rows from two or more tables with matching column values. +- Inner join can be considered as finding the intersection of two sets/Tables. + +**CMU notes for Joins (Too advance):** https://15445.courses.cs.cmu.edu/fall2022/slides/11-joins.pdf + +**Anshuman's Notes:** +https://docs.google.com/document/d/1TIFDVQ1Ok9ZJWTxMyJuvG5-KVwS_8_DeOnuqcDqzvbY/edit#heading=h.2s8eyo1 \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/06 Notes Joins - 2.md b/Non-DSA Notes/SQL Notes/06 Notes Joins - 2.md new file mode 100644 index 0000000..df69936 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/06 Notes Joins - 2.md @@ -0,0 +1,561 @@ + +--- +Agenda +--- + +- Compound Joins +- Types of Joins +- Cross Join +- USING +- NATURAL +- IMPLICIT JOIN +- Join with WHERE vs ON +- UNION + + + +--- +Compound Joins +--- + + +Till now, whenever we did a join, we joined based on only 1 condition. Like in where clause we can combine multiple conditions, in Joins as well, we can have multiple conditions. + +Let's see an example. For every film, name all the films that were released in the range of 2 years before or after that film and their rental rate was more than the rate of that movie. + +```sql +SELECT f1.name, f2.name +FROM film f1 +JOIN film f2 +ON (f2.year BETWEEN f1.year - 2 AND f1.year + 2) AND f2.rental > f1.rental; +``` + +> Note: +> 1. Join does not need to happen on equality of columns always. +> 2. Join can also have multiple conditions. + +A Compound Join is one where Join has multiple conditions on different columns. + +--- +Types of Joins +--- + + +While we have pretty much discussed everything that is mostly important to know about joins, there are a few nitty gritties that we should know about. + +Let's take the join query we had written a bit earlier: + +```sql +SELECT s1.name, s2.name +FROM students s1 +JOIN students s2 +ON s1.buddy_id = s2.id; +``` + +Let's say there is a student that does not have a buddy, i.e., their `buddy_id` is null. What will happen in this case? Will the student be printed? + + +If you remember what we discussed about CRUD, is NULL equal to anything? Nope. Thus, the row will never match with anything and not get printed. The join that we discussed above is also called `inner join as discussed in Joins 1`. You could have also written that as: + +```sql +SELECT s1.name, s2.name +FROM students s1 +INNER JOIN students s2 +ON s1.buddy_id = s2.id +``` + +The keyword INNER is optional. By default a join is INNER join. + +As you see, an INNER JOIN doesn't include a row that didn't match the condition for any combination. + +Opposite of INNER JOIN is OUTER JOIN. Outer Join will include all rows, even if they don't match the condition. There are 3 types of outer joins: +- Left Join +- Right Join +- Full Join + +As the names convey, left join will include all rows from the left table, right join will include all rows from the right table and full join will include all rows from both the tables. + +Let's take an example to understand these well: + +Assume we have 2 tables: students and batches with following data: + + +`batches` + +| batch_id | batch_name | +|----------|------------| +| 1 | Batch A | +| 2 | Batch B | +| 3 | Batch C | + +`students` + +| student_id | first_name | last_name | batch_id | +|------------|------------|-----------|----------| +| 1 | John | Doe | 1 | +| 2 | Jane | Doe | 1 | +| 3 | Jim | Brown | null | +| 4 | Jenny | Smith | null | +| 5 | Jack | Johnson | 2 | + +Now let's write queries to do each of these joins: + +```sql +SELECT * +FROM students s +LEFT JOIN batches b +ON s.batch_id = b.batch_id; +``` + +```sql +SELECT * +FROM students s +RIGHT JOIN batches b +ON s.batch_id = b.batch_id; +``` + +```sql +SELECT * +FROM students s +FULL OUTER JOIN batches b +ON s.batch_id = b.batch_id; +``` + + +Now let's use different types of joins and tell me which row do you think will not be a part of the join. + +`Now let's try to understand each of Outer Joins in depth.` + +--- +Left Join +--- + +As the names convey, `left join` will include all rows from the `left` table, and include rows from `right table` which matches join condition. If there is any row for which there is no match on right side then it will be considered as `Null`. + +> Venn Diagram + +![Screenshot 2024-02-10 at 12.51.25 PM](https://hackmd.io/_uploads/r1sb4s4iT.png) +--- + +`General Syntax:` + +```sql +SELECT column_name(s) +FROM table1 LEFT JOIN table2 +ON table1.column_name = table2.column_name; + +-- It's same as: + +SELECT column_name(s) +FROM table1 LEFT OUTER JOIN table2 +ON table1.column_name = table2.column_name; +``` + +`Example` + +Let’s consider two tables of a supermarket set-up. The first table named Customers gives us information about different customers, i.e., their customer id, name, and phone number. Here, CustID is the primary key that uniquely identifies each row. The second table, named Shopping_Details gives us information about items bought by customers, i.e., item id, customer id (referencing the customer that bought the item), item name, and quantity. + +`Problem Statement` + +Write a query to display all customers irrespective of items bought or not. Display the name of the customer, and the item bought. If nothing is bought, display NULL. + +`Query:` + +```sql +SELECT Customers. Name, Shopping_Details.Item_Name +FROM Customers LEFT JOIN Shopping_Details; +ON Customers.CustID = Shopping_Details.CustID; +``` + +`Infographics:` + +> ![Screenshot 2024-02-10 at 12.57.00 PM](https://hackmd.io/_uploads/SJAISjNo6.png) + + +--- +Right Join +--- + +As the names convey, `right join` will include all rows from the left table, and include rows from `left table` which matches join condition. If there is any row for which there is no match on left side then it will be considered as `Null value`. + +> Venn Diagram + +![Screenshot 2024-02-10 at 1.03.45 PM](https://hackmd.io/_uploads/ByulDi4op.png) + +`General Syntax:` + +```sql +SELECT column_name(s) +FROM table1 RIGHT JOIN table2 +ON table1.column_name = table2.column_name; + +-- It's same as: + +SELECT column_name(s) +FROM table1 RIGHT OUTER JOIN table2 +ON table1.column_name = table2.column_name; +``` + +`Example` + +Let’s consider two tables of a supermarket set-up. The first table named Customers gives us information about different customers, i.e., their customer id, name, and phone number. Here, CustID is the primary key that uniquely identifies each row. The second table, named Shopping_Details gives us information about items bought by customers, i.e., item id, customer id (referencing the customer that bought the item), item name, and quantity. + +`Problem Statement` + +Write a query to get all the items bought by customers, even if the customer does not exist in the Customer database. Display customer name and item name. If a customer doesn’t exist, display NULL. + +`Query:` + +```sql +SELECT Customers.Name, Shopping_Details.Item_Name +FROM Customers RIGHT JOIN Shopping_Details; +ON Customers.CustID = Shopping_Details.CustID; +``` + +`Infographics:` +![Screenshot 2024-02-10 at 1.08.33 PM](https://hackmd.io/_uploads/SyeNuoEia.png) + + +--- +Full Outer Join +--- + +As the names convey, `Full join` will include all rows from the left table as well as right table, If there is any row for which there is no match on either of the sides then it will be considered as `Null value`. + +> Venn Diagram + +![Screenshot 2024-02-10 at 1.12.15 PM](https://hackmd.io/_uploads/SJ3JFj4op.png) + + + +`General Syntax:` + +```sql +SELECT column_name(s) +FROM table1 FULL OUTER JOIN table2 +ON table1.column_name = table2.column_name; +``` + +`Example` + +Let’s consider two tables of a supermarket set-up. The first table named Customers gives us information about different customers, i.e., their customer id, name, and phone number. Here, CustID is the primary key that uniquely identifies each row. The second table, named Shopping_Details gives us information about items bought by customers, i.e., item id, customer id (referencing the customer that bought the item), item name, and quantity. + +`Problem Statement` + +Write a query to provide data for all customers and items ever bought from the store. Display the name of the customer and the item name. If either data does not exist, display NULL. + +`Query:` + +```sql +SELECT Customers.Name, Shopping_Details.Item_Name +FROM Customers FULL OUTER JOIN Shopping_Details +WHERE Customer.CustID = Shopping_Details.CustID; +``` + +`Infographics:` +![Screenshot 2024-02-10 at 1.14.59 PM](https://hackmd.io/_uploads/BkX5YsEia.png) + +--- +When to Use What? +--- +SQL is an essential skill for people looking for Data Engineering, Data Science, and Software Engineering Roles. Joins in SQL is one of the advanced SQL concepts and is often asked in interviews. These questions do not directly state what SQL join to use. Hence, we need to use a four-step analysis before we start forming our SQL query. + +1. Identification: Identify tables relating to the problem statement. We also need to identify relations between these tables, the order in which they are connected, and primary and foreign keys. +- Example: Let’s say we have Tables A and B. Table A and Table B share a relation of Employee Details – Department Details. Table A has three fields – ID, Name, and DeptID. Table B has two fields – DeptID and DeptName. Table A has a primary key ID, and Table B’s primary key is DeptID. Table A and Table B are connected with the foreign key in Table A, i.e., Table B’s primary key, DeptID. + +2. Observe: Observe which join will be most suitable for the scenario. This means it should be able to retrieve all the required columns and have the least number of columns that need to be eliminated by the condition. +- Example: If all values of Table A are required irrespective of the condition depending on Table C, we can use a left outer join on A and C. + +3. Deconstruction: Now that we have all requirements to form our query, firstly, we need to break it into sub-parts. This helps us form the query quicker and make our understanding of the database structure quicker. Here, we also form the conditions on the correctly identified relationships. + +- Example: You need to present data from Table A and Table B. But Table A’s foreign key is Table C’s primary key which is Table B’s foreign key. Hence breaking down the query into results from Table B and C (let’s say Temp) and then common results between its Temp and Table A will give us the correct solution. + +4. Compilation: Finally, we combine all the parts and form our final query. We can use query optimization techniques like heuristic optimization, resulting in quicker responses. + +> **Please refer to this link for more practice:** https://www.scaler.com/topics/sql/joins-in-sql/ + +--- +Quiz 1 +--- + + +Which of the following rows will NOT be a part of the result set in a `RIGHT` JOIN of the students table on the batches table on batch_id? + +### Choices + +- [ ] [1, John, Doe, 1] +- [ ] [3, Jim, Brown, null] +- [ ] [5, Jack, Johnson, 2] +- [ ] None of the above + +--- +Quiz 2 +--- + +If we perform a RIGHT JOIN of the students table on the batches table on batch_id, which row from the students table will NOT be included in the result set? + +### Choices + +- [ ] [1, John, Doe, 1] +- [ ] [3, Jim, Brown, null] +- [ ] [5, Jack, Johnson, 2] +- [ ] None of the above + +--- +Quiz 3 +--- + + +For an INNER JOIN of the students table on the batches table on batch_id, which of the following rows will NOT be included in the resulting set? + +### Choices + +- [ ] [1, John, Doe, 1] +- [ ] [3, Jim, Brown, null] +- [ ] [5, Jack, Johnson, 2] +- [ ] None of the above + + +--- +Quiz 4 +--- + +Which row will NOT appear in the resulting set when we perform a FULL OUTER JOIN of the students table on the batches table on batch_id? + +### Choices + +- [ ] [1, John, Doe, 1] +- [ ] [3, Jim, Brown, null] +- [ ] [5, Jack, Johnson, 2] +- [ ] None of the above + + +--- +CROSS JOIN +--- + +There is one more type of join that we haven't discussed yet. It is called cross join. Cross join is a special type of join that doesn't have any condition. It just combines every row of the first table with every row of the second table. Let's see an example: + +```sql +SELECT * +FROM students s +CROSS JOIN batches b; +``` + +Now you may wonder why might someone need this join? For example, in a clothing store's database, one table might have a list of colors, and another table might have a list of sizes. A cross join can generate all possible combinations of color and size. + +`colors:` +![Jan24_Joins2](https://hackmd.io/_uploads/BJIdijEsp.jpg) + +`Sizes:` +![IMG_5BDAF86319F4-1](https://hackmd.io/_uploads/S10sii4oa.jpg) + +`Query:` + +```sql= +SELECT * +FROM COLORS +CROSS JOIN SIZES; +``` + +`RESULTANT TABLE:` +![IMG_238F3C40AC00-1](https://hackmd.io/_uploads/H1Uf3jNj6.jpg) + + +Cross join produces a table where every row of one table is joined with all rows of the other table. So, the resulting table has `N*M` rows given that the two tables have N and M rows. + + + + + +That's pretty much all different kind of joins that exist. There are a few more syntactic sugars that we can use to write joins. Let's see them: + + +**Now, let's understand some syntactical sugars and tiny topics:** + + +--- +USING +--- + +Let's say we want to join 2 tables on a column that has the same name in both the tables. For example, in the students and batches table, we want to join on the column `batch_id`. We can write the join as: + +```sql +SELECT * +FROM students s +JOIN batches b +ON s.batch_id = b.batch_id; +``` + +But there is a shorter way to write this. We can write this as: + +```sql +SELECT * +FROM students s +JOIN batches b +USING (batch_id); + +-- Here the above tables will be joined based on equality of batch_id +``` +> Note: Using is a syntactical sugar used to write queries with ease. + +--- +NATURAL JOIN +--- + +Many times it happens that when you are joining 2 tables, they are mostly on the columns with same name. If we want to join 2 tables on all the columns that have the same name, we can use NATURAL JOIN. For example, if we want to join students and batches table on all the columns that have the same name on both sides, we can write: + +```sql +SELECT * +FROM students s +NATURAL JOIN batches b; + +-- In above tables we have only batch_id as common column in both of the tables with same name. +``` + +--- +IMPLICIT JOIN +--- + +There is one more way to write joins. It is called implicit join. In this, we don't use the JOIN keyword. Instead, we just write the table names and the condition. For example, if we want to write the join query that we wrote earlier as implicit join, we can write: + +```sql +SELECT * +FROM students s, batches b; + +-- Above query will work as cross joins behind the scenes. +``` +> **Note: Behind the scenes, this is same as a cross join.** + + +--- +Join with WHERE vs ON +--- + +Let's take an example to discuss this. If we consider a simple query: +```sql +SELECT * +FROM A +JOIN B +ON A.id = B.id; +``` +In pseudocode, it will look like: + +```python3 +ans = [] + +for row1 in A: + for row2 in B: + if (ON condition matches): + ans.add(row1 + row2) + +for row in ans: + print(row.id, row.id) +``` +Here, the size of intermediary table (`ans`) will be less than `n*m` because some rows are filtered. + +We can also write the above query in this way: + +```sql +SELECT * +FROM A, B +WHERE A.id = B.id; +``` +The above query is nothing but a CROSS JOIN behind the scenes which can be written as: + +```sql +SELECT * +FROM A +CROSS JOIN B +WHERE A.id = B.id; +``` +Here, the intermediary table `A CROSS JOIN B` is formed before going to WHERE condition. + +In pseudocode, it will look like: + +```python3 +ans = [] + +for row1 in A: + for row2 in B: + ans.add(row1 + row2) + +for row in ans: + if (WHERE condition matches): + print(row.id, row.id) +``` + +The size of `ans` is always `n*m` because table has cross join of A and B. The filtering (WHERE condition) happens after the table is formed. + +From this example, we can see that: +1. The size of the intermediary table (`ans`) is always greater or equal when using WHERE compared to using the ON condition. Therefore, joining with ON uses less internal space. +2. The number of iterations on `ans` is higher when using WHERE compared to using ON. Therefore, joining with ON is more time efficient. + +In conclusion, +1. The ON condition is applied during the creation of the intermediary table, resulting in lower memory usage and better performance. +2. The WHERE condition is applied during the final printing stage, requiring additional memory and resulting in slower performance. +3. Unless you want to create all possible pairs, avoid using CROSS JOINS. + +--- +UNION +--- + +Sometimes, we want to print the combination of results of multiple queries. Let's take an example of the following tables: + +`students` +| id | name | +|----|------| + +`employees` +| id | name | +|----|------| + +`investors` +| id | name | +|----|------| + + +You are asked to print the names of everyone associated with Scaler. So, in the result we will have one column with all the names. + +We can't have 3 SELECT name queries because it will not produce this singular column. We basically need SUM of such 3 queries. Join is used to stitch or combine rows, here we need to add the rows of one query after the other to create final result. + +UNION allows you to combine the output of multiple queries one after the other. + +```sql +SELECT name FROM students +UNION +SELECT name FROM employees +UNION +SELECT name FROM investors; +``` +Now, as the output is added one after the other, there is a constraint: Each of these individual queries should output the same number of columns. + +Note that, you can't use ORDER BY for the combined result because each of these queries are executed independently. + +UNION outputs distinct values of the combined result. It stores the output of individual queries in a set and then outputs those values in final result. Hence, we get distinct values. But if we want to keep all the values, we can use UNION ALL. It stores the output of individual queries in a list and gives the output, so we get all the duplicate values. + +--- +Difference Between JOIN and UNION in SQL: +--- + +![Screenshot 2024-02-10 at 1.31.51 PM](https://hackmd.io/_uploads/BJmt6o4jp.png) + +> Conclusion +1. The SQL JOIN is used to combine two or more tables. +2. The SQL UNION is used to combine two or more SELECT statements. +3. The SQL JOIN can be used when two tables have a common column. +4. The SQL UNION can be used when the columns along with their attributes are the same. + +That's all about Union and Joins! See you next time. Thanks. + + +--- +Solution to Quizzes: +--- + +> -- +Quiz1: Option D (None of the above) +Quiz2: Option B [3, Jim, Brown, null] +Quiz3: Option B [3, Jim, Brown, null] +Quiz4: Option D (None of the above) +-- \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/07 Notes Aggregate Queries.md b/Non-DSA Notes/SQL Notes/07 Notes Aggregate Queries.md new file mode 100644 index 0000000..44a9820 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/07 Notes Aggregate Queries.md @@ -0,0 +1,294 @@ + +--- +Agenda +--- + + +- Aggregate Queries + - Aggregate Functions + - COUNT + - \* (asterisk) +- Other aggregate functions +- GROUP BY Clause +- HAVING Clause + +--- +Aggregate Queries +--- + + +Hello Everyone, till now whatever SQL queries we had written worked over each row of the table one by one, filtered some rows, and returned the rows. +Eg: We have been answering questions like: +- Find the students who ... +- Find the batches who ... +- Find the name of every student. + + +But now we will be answering questions like: +- What is the average salary of all the employees? +- What was the count of movies released in each year? +- What is the maximum salary of all the employees? + + + +In above questions, we are not interested in the individual rows, but we are interested to get some data by combining/aggregating multiple rows. For example, to find the answer of first, you will have to get the rows for all of the employees, go through their salary column, average that and print. + +How to do this is what we are going to learn now. +> Activity: Search meaning of aggregate on Google. + +--- +Aggregate Functions +--- + + +SQL provides us with some functions which can be used to aggregate data. These functions are called aggregate functions. Imagine a set of column. With the values of that column across all rows, what all operations would you may want to do? + + + +Correct, to allow for exactly all of these operations, SQL provides us with aggregate functions. Aggregate functions will always output 1 value. Let's go through some of these functions one by one and see how they work. + +--- +COUNT +--- + +Count function takes the values from a particular column and returns the number of values in that set. Umm, but don't you think it will be exactly same as the number of rows in the table? Nope. Not true. Aggregate functions only take `not null values` into account. So, if there are any `null values` in the column, they will not be counted. + +Example: Let's take a students table with data like follows: + +>STUDENTS + +| id | name | age | batch_id | +|----|------|-----|----------| +| 1 | A | 20 | 1 | +| 2 | B | 21 | 1 | +| 3 | C | 22 | null | +| 4 | D | 23 | 2 | + +If you will try to run `COUNT` and give it the values in batch_id column, it will return 3. `Because there are 3 not null values in the column`. This is different from number of rows in the students table. + +Let's see how do you use this operation in SQL. + +```sql +SELECT COUNT(batch_id) FROM students; +``` + +To understand how aggregate functions work via a pseudocode, let's see how SQL query optimizer may execute them. + +```python +table = [] + +count = 0 + +for row in table: + if row[batch_id]: + count += 1 + +print(count) +``` + +Few things to note here: +While printing, do we have access to the values of row? Nope. We only have access to the count variable. So, we can only print the count. Extrapolating this point, when you use aggregate functions, you can only print the result of the aggregate function. You cannot print the values of the rows. + +Eg: + +```sql +SELECT COUNT(batch_id), batch_id FROM students; +``` + +This will be an invalid query. Because, you are trying to print the values of `batch_id` column as well as the count of `batch_id` column. But, you can only print the count of `batch_id` column. + +--- +### * (asterisk) to count number of rows in the table +--- +What if we want to count the number of rows in the table? We can do that by passing a `*` to the count function. + +\* as we know from earlier, refers to all the columns in the table. So, count(*) will count the number of rows in the table. You may think what if there is a `Null` value in a row. Yes, there can be one/more `Null` values in a row but the whole row can't be `Null` as per rule of MySQL. + +```sql +SELECT COUNT(*) FROM students; +``` + +The above query will print the number of rows in the table. + +--- +Other aggregate functions +--- + + +We can use multiple aggregation function in the same query as well. For example: + +```sql +SELECT COUNT(batch_id), AVG(age) FROM students; +``` + +Some aggregate functions are as follows. + +1. MAX: Gives Maximum value +2. MIN: Gives minimum value + +Note that, values in the column must be comparable for MAX and MIN. + +3. AVG: Gives average of non NULL values from the column. +For example: AVG(1, 2, 3, NULL) will be 2. +4. SUM: Gives sum, ignoring the null values from the column. + +`Max, Min` +```sql +SELECT MAX(age), MIN(age) +FROM students; + +-- SOLUTION: + +MAX MIN +23 20 +``` + +`Sum` +```sql +SELECT SUM(batch_id) +FROM students; + +-- SOLUTION: + +SUM +4 +``` + +> Very Important: We can't use Aggregatets in Nested. It will give us error. + +Many times we have seen learners using Nesting in `Aggregates` during Mock interviews. Please note this for future reference now. +Example: + +```sql= +SELECT SUM(COUNT(batch_id)) +FROM STUDENTS; + +-- This above query will not work. +``` + +However distinct can be used inside an aggregate function as distinct is not an aggregate function. + +`Example:` + +```sql= +SELECT SUM(DISTINCT(batch_id)) +FROM STUDENTS; + +-- Here we have only two batch_id which are DISTINCT 1, 2 +-- Therefore we will do sum of (1, 2) i.e = 3. +``` + +> Learn more about aggregates here: https://www.scaler.com/topics/sql/aggregate-function-in-sql/ + +--- +GROUP BY clause +--- + + +Till now we combined multiple values into a single values by doing some operation on all of them. What if, we want to get the final values in multiple sets? That is, we want to get the set of values as our result in which each value is derived from a group of values from the column. + +The way Group By clause works is it allows us to break the table into multiple groups so as to be used by the aggregate function. + +For example: `GROUP BY batch_id` will bring all rows with same `batch_id` together in one group + +> Note: Also, GROUP BY always works before aggregate functions. Group By is used to apply aggregate function within groups (collection of rows). The result comes out to be a set of values where each value is derived from its corresponding group. + +Let's take an example. + + +| id | name | age | batch_id | +|----|------|-----|----------| +| 1 | A | 20 | 1 | +| 2 | B | 21 | 3 | +| 3 | C | 22 | 1 | +| 4 | D | 23 | 2 | +| 5 | E | 23 | 1 | +| 6 | F | 25 | 2 | +| 7 | G | 22 | 3 | +| 8 | H | 21 | 2 | +| 9 | I | 20 | 1 | + +```sql +SELECT COUNT(*), batch_id FROM students GROUP BY batch_id; +``` + +The result of above query will be: +| COUNT(\*) | batch_id | +|-----------|----------| +| 4 | 1 | +| 3 | 2 | +| 2 | 3 | + +Explanation: The query breaks the table into 3 groups each having rows with `batch_id` as 1, 2, 3 respectively. There are 4 rows with `batch_id = 1`, 3 rows with `batch_id = 2` and 2 rows with `batch_id = 3`. + +Note that, we can only use the columns in SELECT which are present in Group By because only those columns will have same value across all rows in a group. + +Now let's try to understand this using Infographics: + +Here's our student table. Notice how the data is mixed and not organised by batch. Normally, this data is just a list. But what if we want to organise these students by their batch? + +`Students:` + +![Screenshot 2024-02-12 at 2.43.12 PM](https://hackmd.io/_uploads/SJHHZvDop.png) + +Now, let's see what might happen internally when we apply the 'Group By' clause to this table, organising it by batch names. + +![Screenshot 2024-02-12 at 2.44.16 PM](https://hackmd.io/_uploads/r1i_WPvjT.png) + +After grouping, notice how all students from each batch are now grouped together, making it easier to analyse the data where we can apply aggregates as well. + +`Final group of students:` +![Screenshot 2024-02-12 at 2.46.47 PM](https://hackmd.io/_uploads/ByHMfvDsa.png) + +Now we can apply queries like: + +```sql +SELECT count(*), batch_name +FROM students +GROUP BY batch_name; + +-- It will give output as count of students for each batch. +-- Also we can print data which is common to a group in our case batch_name. +-- Groups don't have student's name as common. So, there will be an error if you use name in select. +``` + + +--- +HAVING Clause +--- + + +HAVING clause is used to filter groups. Let's take a question to understand the need of HAVING clause: + +There are 2 tables: Students(id, name, age, batch_id) and Batches(id, name). Print the batch names that have more than 100 students along with count of the students in each batch. + +```sql +SELECT COUNT(S.id), B.name +FROM Students S +JOIN Batches B ON S.batch_id = B.id +GROUP BY B.name; +HAVING COUNT(S.id) > 100; + +-- Using `WHERE` here instead of `HAVING` can give us error. +``` + +Here, `GROUP BY B.name` groups the results by the `B.name` column (batch name). It ensures that the count is calculated for each distinct batch name. +`HAVING COUNT(S.id) > 100` condition filters the grouped results based on the count of `S.id` (number of students). It retains only the groups where the count is greater than 100. + +The sequence in which query executes is: +- Firstly, join of the two tables is done. +- Then is is divided into groups based on `B.name`. +- In the third step, result is filtered using the condition in HAVING clause. +- Lastly, it is printed through SELECT. + +FROM -> WHERE -> GROUP BY -> HAVING -> SELECT + +`WHERE is not build to be able to handle aggregates`. We can not use WHERE after GROUP BY because WHERE clause works on rows and as soon as GROUP BY forms a result, the rows are convereted into groups. So, no individual conditions or actions can be performed on rows after GROUP BY. + +> **Note: WHERE is not build to be able to handle aggregates as 'WHERE' works with rows not groups** + +* Differences between Having and Where: + ![Screenshot 2024-02-14 at 1.26.50 PM](https://hackmd.io/_uploads/r13_Gl5o6.png) + + +That's all for this class. If there are any doubts feel free to ask now. Thanks! \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/08 Notes Subqueries and Views.md b/Non-DSA Notes/SQL Notes/08 Notes Subqueries and Views.md new file mode 100644 index 0000000..d862ced --- /dev/null +++ b/Non-DSA Notes/SQL Notes/08 Notes Subqueries and Views.md @@ -0,0 +1,609 @@ + +--- +Agenda +--- + + +- Subqueries +- Subqueries and IN clause +- Subqueries in FROM clause +- ALL and ANY +- Correlated subqueries +- EXISTS +- Subqueries in WHERE clause +- Views + + +--- +Subqueries +--- + + +Subqueries are very intutive way of writing SQL queries. They break a problem into smaller problems and combine their result to get complete answer. Let's take few examples. + +1. Given a `students` table, find all the students who have psp greater than the maximum psp of student of batch 2. + + `students` + + | id | name | psp | batch_id | + | -- | ---- | --- | -------- | + + + Algorithm: + + ```python + Find maximum psp of batch 2, store it in x. + ans = [] + + for all students(S): + if S.psp > x: + ans.add(S) + + return ans + ``` + + The query will be: + + ```sql + SELECT * + FROM students + WHERE psp > (SELECT max(psp) + FROM students + WHERE batch_id = 2 + ); + ``` + +2. Find all students whose psp is geater than psp of student with id = 18 + Algorithm: + ```python + Find the psp of student with id = 18, store it in x. + Find all students with psp > x and give the answer. + ``` + Subqueries will be: + ```sql + SELECT psp + FROM students + WHERE id = 18; + ``` + + ```sql + SELECT * + FROM students + WHERE psp > x; + ``` + + The final query will be: + ```sql + SELECT * + FROM students + WHERE psp > (SELECT psp + FROM students + WHERE id = 18); + ``` + +Subqueries should always be enclosed in parenthesis. The above query is readable and intuitive becaue of subqueries. + +In reality, we prefer to +- break problems into parts. +- solve smaller problems and use their result to solve bigger problems. + +Most of the times, problems that we solve via subqueries can also be solved via some other smart trick. But subqueries make our queries easier to understand and create. + + +--- +Tradeoff of subqueries +--- + +The tradeoff is bad performance. For example, consider the following: + +```sql +SELECT * +FROM students +WHERE psp > 18; +``` + +Without subquery: + +```python +students = [] +ans = [] + +for student S: students: + if S.psp > 18: + ans.add(S) + +return ans +``` + +With subquery: +```python +for student S: students: + ans = [] + for student O: students: + if O.id = 18: + ans.add(O) + +return ans[0][psp] +``` + +The above query takes O(N^2). The subquery gets executed for every row. Hence, it leads to bad performance. Although, SQL optimizers help with performance improvement. + +Example: +Consider the table `film`. Find all the years where the average of the `rental_rate` of films of that year was greater than the global average of `rental_rate` (average of all films). + +Algorithm: +1. Find global average. + +```sql +SELECT avg(rental_rate) +FROM film; +``` + +2. Find average of every year. +```sql +SELECT release_year, avg(rental_rate) +FROM film +GROUP BY release_year; +``` + +3. Get filtered groups. +```sql +SELECT release_year, avg(rental_rate) +FROM film +GROUP BY release_year +HAVING avg(rental_rate) > ( + SELECT avg(rental_rate) + FROM film +); +``` + +The subqueries we wrote till now gave a single value as output. But a query can give 4 types of outputs: + +| Number of rows | Number of columns | Output | +| -------------- | ----------------- | ------------- | +| 1 | 1 | Single value | +| 1 | m | Single row | +| m | 1 | Single column | +| m | m | Table | + +We were using >, <, =, <>, >=, <= operations because there was a single value. But what if, it is not just a single value? Let's take an example. + +--- +Subqueries and IN clause +--- + + +Let's say there is a table called `users`. Find the names of students that are also the names of a TA. +`users` + +| id | name | is_student | is_TA | +| -- | ---- | ---------- | ----- | + +This means, if there is a Naman who is a student and also a Naman who is a TA, show Naman in the output. It does not have to be the same Naman, just the name should be same. + +If we had two different tables, `students` and `tas`, the query would have been like this: + +```sql +SELECT DISTINCT S.name +FROM students S +JOIN tas T +ON S.name = T.name; +``` + +But here we have just one table. So, consider the following query: + +```sql +SELECT DISTINCT S.name +FROM users S +JOIN users T +ON S.is_student = true + AND T.is_TA = true + AND S.name = T.name; +``` +How many of you think this query using JOIN is complex? Let's try to solve it using subqueries. + +Algorithm: + +```python +Get Name of all TA, store it in ans[]. + +Get students whose name is in ans. +``` + +Subqueries will be: +```sql +SELECT DISTINCT name +FROM users U +WHERE U.is_TA = true; +``` + +```sql +SELECT DISTINCT name +FROM users U +WHERE U.is_student = true + AND U.name IN (/*result of first subquery*/); +``` + +Combining both of these: +```sql +SELECT DISTINCT name +FROM users U +WHERE U.is_student = true + AND U.name IN ( + SELECT DISTINCT name + FROM users U + WHERE U.is_TA = true + ); +``` + +--- +Subqueries in FROM clause +--- + + +Now, we saw in last example that we got multiple values as output. Could we have comparison condition on these values? Let's look into it. + +Find all of the students whose psp is not less than the smallest psp of any batch. + +Algorithm: +```python +Find minimum psp of every batch, store it in x[]. + +Find maximum psp from x[], store in y. + +Print the details where psp is greater than y. +``` + +Subqueries will be: + +```sql +SELECT min(psp) +FROM students +GROUP BY batch_id; +``` + +```sql +SELECT max(psp) +FROM x; +``` + +```sql +SELECT * FROM students +WHERE psp > y; +``` + +Combining the above subqueries: + +```sql +SELECT * +FROM students +WHERE psp > ( + SELECT max(psp) + FROM ( + SELECT min(psp) + FROM students + GROUP BY batch_id + ) minpsps +); +``` + +Whenever you have a subquery in FROM clause, it is required to give it a name, hendce, `minpsps`. + +We can have subquery in FROM clause as well. This subquery's output is considered like a table in itself upon which you can write any other read queries. + +> Note: You should name a subquery in FROM clause. + + +--- +ALL and ANY +--- + + +Now let us say we want to find if a value x is greater than all the values in a set, we use ALL clause in such cases. + +To understand this, consider `psp > ALL (10, 20, 30, 45)`. ALL compares left hand side with every value of the right hand side. If all of them return `True`, ALL will return `True`. + + +```sql +SELECT * +FROM students +WHERE psp > ALL ( + SELECT min(psp) + FROM students + GROUP BY batch_id +); +``` + +Similar to how AND has a pair with OR, ALL has a pair with ANY. +ANY compares the left hand side with every value on the right hand side. If any of them returns `True`, ANY returns `True`. + +--- +Quiz 1 +--- + +What is output of `x = ANY(a, b, c)` same as + +### Choices +- [ ] = +- [ ] ANY +- [ ] IN +- [ ] ALL + + +--- +Correlated subqueries +--- + + +Let's take an example first. Find all students whose psp is greater than average `psp` of their batch. + +Algorithm: +Query 1: Get students whose `psp` > x (say). +Query 2: x stores average `psp` of student's batch. + +Based upon which student we are considering, query 2 will give varying answers. Are the two subqueries independent of each other? + + + +No, these two are correlated. Correlated subqueries are queries where the subquery uses a variable from the parent query. + +```sql +SELECT * +FROM students +WHERE psp > x; +``` + +Here, the value of `x` (avg psp of batch) is dependent upon which student we are calculating it for as each student can have different batches. + +Let's see a different set of subqueries: + +```sql +SELECT avg(psp) +FROM students +WHERE batch_id = n; +``` + +```sql +SELECT * +FROM students +WHERE psp > y; +``` + +By putting the first query in place of `y`, will the final query work? No. Because there is no `n`. Assume you had the value of `n`, then? Put `S.batch_id` instead of `n`. + +```sql +SELECT * +FROM students S +WHERE psp > ( + SELECT avg(psp) + FROM students + WHERE batch_id = S.batch_id +); +``` + +Here, this subquery is using a variable `S.batch_id` from the parent query. For every row from `students` table, we will be able to use the value of `batch_id` in the subquery. + + +--- +EXISTS +--- + + +There is another clause we can use. Let's say we want to find all students who are also TA given the two tables. Here `st_id` can be `NULL` if the TA is not a student. + +`students` + +| id | name | psp | +| -- | ---- | --- | + +`tas` + +| id | name | st_id | +| -- | ---- | ----- | + +Let's make the subquery: + +```sql +SELECT st_id +FROM tas +WHERE st_id IS NOT NULL; +``` +Final query will use the above subquery: + +```sql +SELECT * +FROM students +WHERE id IN ( + SELECT st_id + FROM tas + WHERE st_id IS NOT NULL +); +``` + +Now see how we can use EXISTS for the above query. + +```sql +SELECT * +FROM students S +WHERE EXISTS ( + SELECT st_id + FROM tas + WHERE tas.st_id = S.id +); +``` + +What EXISTS does is, for every row of `students` it will run the subquery. If the subquery returns any number of rows greater than zero, it returns `True`. In this query, finding `tas.st_id = S.id` is faster because of indexes. And as soon as MySQL finds one such row, EXISTS will return `True`. Whereas, in the previous query had to go through all of the rows to get the answer of subquery. So, this query is faster than the previous one. + + +**Example 2** + +Find all the student names that have taken a mentor sesion. + +`students` + +| id | name | +| -- | ---- | + +`mentor_sessions` + +| s_id | stud_id | mentor_id | +| ---- | ------- | --------- | + +Consider this query: + +```sql +SELECT * +FROM students +WHERE id IN ( + SELECT stud_id + FROM mentor_sessions +); +``` + +The subquery will give a huge number of rows. Whereas, we just need to find if there is even a single row in `mentor_sessions` with some `stud_id`. + +```sql +SELECT * +FROM students S +WHERE EXISTS ( + SELECT * + FROM mentor_sessions + WHERE stud_id = S.id +); +``` + +In this way, MySQL allows EXISTS to make faster queries. + +--- +Views: +--- + +Imagine in sakillaDB, I frequently have queries of the following type: + - Given an actor, give me the name of all films they have acted in. + - Given a film, give me the name of all actors who have acted in it. + +Getting the above requires a join across 3 tables, `film`, `film_actor` and `actor`. + +Why is that an issue? + - Writing these queries time after time is cumbersome. Infact imagine queries that are even more complex - requiring joins across a lot of tables with complex conditions. Writing those everytime with 100% accuracy is difficult and time-taking. + - Not every team would understand the schema really well to pull data with ease. And understanding the entire schema for a large, complicated system would be hard and would slow down teams. + +So, what's the solution? +Databases allow for creation of views. Think of views as an alias which when referred is replaced by the query you store with the view. + +So, a query like the following: + +```sql +CREATE OR REPLACE view actor_film_name AS + +SELECT + concat(a.first_name, a.last_name) AS actor_name, + f.title AS file_name +FROM actor a + JOIN film_actor fa + ON fa.actor_id = a.actor_id + JOIN film f + ON f.film_id = fa.film_id +``` + + +**Note that a view is not a table.** It runs the query on the go, and hence data redundancy is not a problem. + +### Operating with views + +Once a view is created, you can use it in queries like a table. Note that in background the view is replaced by the query itself with view name as alias. +Let's see with an example. + +```sql +SELECT film_name FROM +actor_film_name WHERE actor_name = "JOE SWANK" +``` + +OR + +```sql +SELECT actor_name FROM +actor_file_name WHERE film_name = "AGENT TRUMAN" +``` + +If you see, with views it's super simple to write queries that I write frequently. Lesser chances to make an error. +Note that however, actor_file_name above is not a separate table but more of an alias. + +An easy way to understand that is that assume every occurrence of `actor_file_name` is replaced by + +```sql +(SELECT + concat(a.first_name, a.last_name) AS actor_name, + f.title AS file_name +FROM actor a + JOIN film_actor fa + ON fa.actor_id = a.actor_id + JOIN film f + ON f.film_id = fa.film_id) AS actor_file_name +``` + +**Caveat:** Certain DBMS natively support materialised views. Materialised views are views with a difference that the views also store results of the query. This means there is redundancy and can lead to inconsistency / performance concerns with too many views. But it helps drastically improve the performance of queries using views. MySQL for example does not support materialised views. Materialised views are tricky and should not be created unless absolutely necessary for +performance. + +#### How to best leverage views + +Imagine there is an enterprise team at Scaler which helps with placements of the students. +Should they learn about the entire Scaler schema? Not really. They are only concerned with student details, their resume, Module wise PSP, Module wise Mock Interview clearance, companies details and student status in the companies where they have applied. + +In such a case, can we create views which gets all of the information in 1 or 2 tables? If we can, then they need to only understand those 2 tables and can work with that. + +#### More operations on views + +**How to get all views in the database:** + +```sql +SHOW FULL TABLES WHERE table_type = 'VIEW'; +``` + +**Dropping a view** + +```sql +DROP VIEW actor_file_name; +``` + +**Updating a view** + +```sql +ALTER view actor_film_name AS + + SELECT + concat(a.first_name, a.last_name) AS actor_name, + f.title AS file_name + FROM actor a + JOIN film_actor fa + ON fa.actor_id = a.actor_id + JOIN film f + ON f.film_id = fa.film_id +``` + +**Note:** Not recommended to run update on views to update the data in the underlying tables. Best practice to use views for reading information. + +**See the original create statement for a view** + +```sql +SHOW CREATE TABLE actor_film_name +``` + + + + + + + + + + +That is all for today, thanks! + +--- +Solution to Quizzes: +--- + +> -- +Quiz1: Option C (IN) +-- \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/09 Notes Indexing.md b/Non-DSA Notes/SQL Notes/09 Notes Indexing.md new file mode 100644 index 0000000..04db83c --- /dev/null +++ b/Non-DSA Notes/SQL Notes/09 Notes Indexing.md @@ -0,0 +1,402 @@ + +--- +Agenda +--- + + +- Introduction to Indexing +- How Indexes Work +- Indexes and Range Queries + - Data structures used for indexing +- Cons of Indexes +- Indexes on Multiple Columns +- Indexing on Strings +- How to create index + + +--- +Introduction to Indexing +--- + + +Hello Everyone + +Till now, we had been discussing majorly about how to write SQL queries to fetch data we want to fetch. While discussing those queries, we also often wrote pseudocode talking about how at a higher level that query might work behind the scenes. + +Let us go back to that pseudocode. What do you think are some of the problems you see a user of DB will face if the DB really worked exactly how the pseudocode mentioned it worked? + + +Correct! In the pseudocode we had, for loops iterated over each row of the database to retrieve the desired rows. This resulted in a minimum time complexity of O(N) for every query. When joins or other operations are involved, the complexity further increases. + +Adding to this, in which hardware medium is the data stored in a database? + +Yes. A database stores its data in disk. Now, one of the biggest problems with disk is that accessing data from disk is very slow. Much slower than accessing data from RAM. For reference, read https://gist.github.com/jboner/2841832 Reading data from disk is 80x slower than reading from RAM! Let's talk about how data is fetched from the disk. We all know that on disk DB stores data of each row one after other. When data is fetched from disk, OS fetches data in forms of blocks. That means, it reads not just the location that you want to read, but also locations nearby. + +> Inside the database, data is organised in memory blocks known as shards. + +![Screenshot 2024-02-12 at 4.22.35 PM](https://hackmd.io/_uploads/rkQiOdvj6.png) + + +First OS fetches data from disk to memory, then CPU reads from memory. Now imagine a table with 100 M rows and you have to first get the data for each row from disk into RAM, then read it. It will be very slow. Imagine you have a query like: + +```sql +select * from students where id = 100; +``` + + +To execute above, you will have to go through literally each row on the disk, and access even the **memory blocks/shards** where this row doesn't exist. Don't you think this is a massive issue and can lead to performance problems? + +To understand this better, let's take an example of a book. Imagine a big book covering a lot of topics. Now, if you want to find a particular topic in the book, what will you do? Will you start reading the book from the first page? No, right? You will go to the index of the book, find the page number of the topic you want to read, and then go to that page. This is exactly what indexing is. As index of a book helps go to the correct page of the book fast, the index of a database helps go to the correct block of the disk fast. + +Now this is a very important line. Many people say that an index sorts a table. Nope. It has nothing to do with sorting. We will go over this a bit later in today's class. The major problem statement that indexes solve is to reduce the number of disk block accesses to be done. By preventing wastefull disk block accesses, indexes are able to increase performance of queries. + +How is performance related with Inexing: + +`Performance of SQL queries with powerful hardware and good optimizations but without indexing:` +![Performance_without_Indexing](https://hackmd.io/_uploads/SktMttvsa.png) + +--- + +`Performance of SQL queries with indexing:` +![Performance_with_Indexing](https://hackmd.io/_uploads/BkDuttPsp.png) + +>Pic credit: Efficient MySQL Performance: By Daniel Nichter + +**MySQL leverages hardware**, **optimizations**, and **indexes** to achieve performance when accessing data. Hardware is an obvious leverage because MySQL runs on hardware: the **faster the hardware, the better the performance**. `Less obvious and perhaps more surprising is that hardware provides the least leverage`. Will explain why in a moment. Optimizations refer to the numerous techniques, algorithms, and data structures that enable MySQL to utilize hardware efficiently. **Optimizations** `bring the power of hardware into focus`. And focus is the difference between a light bulb and a laser. Consequently, optimizations provide more leverage than hardware. If databases were small, hardware and optimizations would be sufficient. But increasing data size deleverages the benefits of hardware and optimizations. **Without indexes, performance is severely limited.** + + + +> **Note: Indexes provide the most and the best leverage. They are required for any nontrivial amount of data.** + +--- +How Indexes Work +--- + +While we have talked about the problem statement that indexes help solve, let's talk about how indexes work behind the scenes to optimize the queries. Let's try to build indexes ourselves. Let's imagine a huge table with 100s of millions of rows in table spread across 100s of **disk blocks also known as Shards.** We have a query like: + +```sql +select * from students where id = 100; +``` + +We want to somehow avoid going to any of the disk block that is definitely not going to have the student with id 100. We need something that can help me directly know that hey, **the row with id 100** is present in **this block**. Are you familiar with a data structure that can be stored in memory and can quickly provide the block information for each ID? Key value pairs? + + + +Correct. A map or a hashtable, whatever you call it can help us. If we maintain a **hashmap** where **key** is the id of the student and **value** is the **disk block** where the **row containing that ID is present**, is it going to solve our problem? Yes! That will help. Now, we can directly go to the block where the row is present and fetch the row. **This is exactly how indexes work.** They use some other data structure, which we will come to later. + +Here we had queries on id. An important thing about `id` is that id is? + + +Yes. ID is unique. Will the same approach work if the column on which we are querying may have duplicates? Like multiple rows with same value of that column? Let's see. Let's imagine we have an SQL query as follows: + +```sql +select * from students where name = 'Rahul'; +``` + +How will you modify your map to be able to accomodate multiple rows with name 'Rahul'? + +> NOTE: We can maintain a `list` for each key. + +What if we modify our map a bit. Now our **keys will be String (name)** and **values will be a list of blocks** that contain that name. Now, for a query, we will first go to the list of blocks for that name, and then go to each block and fetch the rows. **This way as well, have we avoided fetching the blocks from the disk that were useless?** Yes! Again, this will ensure our performance speeds up. + +--- +Indexes and Range Queries +--- + +So, is this all that is there about indexes? Are they that simple? Well, no. Are the SQL queries you write always like `x = y`? What other kind of queries you often have to do in DB? + +> Range queries? + +If a HashMap is how an index works, do you think it will be able to take care of range queries? Let's say we have a query like: + +```sql +select * from students where psp between 40.1 and 90.1; +``` + +Will you be able to use hashmap to get the blocks that contain these students? Nope. A hashmap allows you to get a value in O(1) but if you have to check all the values in the range, you will have to check them 1 by 1 and potentially it will take O(N) time. + +Even if we run a loop for a certian range then also we will not be able to get all the numbers since there can be many decimal numbers as well. Ex: 40.3, 40.32, 40.35. + +`Since, data is not sorted in case of hashmaps hence we have to exclusively check for every number in range query which in turn makes overall performance slow.` + +What are other issues? +- For finding k-th minimum element and supporting updates (Hashmap would be like bruteforce because it does not keep elements sorted, so we need something like Balanced binary search tree) +- Data structure for finding minimum element in any range of an array with updates (Once again hashmap is just too slow for this, we need something like segment tree) + + + + +So how will we solve it. Is there any other type of Map you know? Something that allow you to iterate over the values in a sorted way? + +> NOTE: Hint: TreeMap? + +Correct. There is another type of Map called TreeMap. + + +--- +TreeMap +--- + +Let's in brief talk about the working of a TreeMap. For more detailed discussion, revise your DSA classes. A TreeMap uses a Balanced Binary Search Tree (often AVL Tree or Red Black Tree) to store data. Here, each node contains data and the pointers to left and right node. + +> NOTE: Link for TreeMap: https://www.scaler.com/topics/treemap-in-java/ + +Now, how will a TreeMap help us in our case? A TreeMap allows us to get the node we are trying to query in O(log N). From there, we can move to the next biggest value in O(log N). **Thus, queries on range can also be solved.** + +**Internal Working of TreeMap in Java:** + +TreeMap internally uses a Red-Black tree, a **self-balancing binary search tree** containing an extra bit for the color (either red or black). The sole purpose of the colors is to make sure at every step of insertion and removal that the tree remains balanced. + +In case of TreeMap: +- **Each Nod**e contains a **Key-Value** pair storing key and their **corresponding Memory Address** as **value**. Along witht that reference to left node and right node. +- Each Node have 2 children. + + +`Diagram:` +![IMG_C87D6B6B0337-1](https://hackmd.io/_uploads/S1s1WKDjp.jpg) + + +> Activity: Can we further reduce the complexity of this tree? What is complexity of a TreeMap: log(Height). Can we reduce complexity further by reducing the height of Tree? We will discuss it further in B+Trees. + + +--- +B and B+ Trees +--- + + +Databases also use a Tree like data structure to store indexes. But they don't use a TreeMap. They use a B Tree or a B+ Tree. Here, each node can have multiple children. This helps further reduce the height of the tree ultimately reducing the time complexity, making queries faster. + +**Properties of B-tree** + +Following are some of the properties of B-tree in DBMS: + +- A non-leaf node's number of keys is one less than the number of its children. +- The number of keys in the root ranges from one to (m-1) maximum. Therefore, root has a minimum of two and a maximum of m children. +- The keys range from min([m/2]-1) to max(m-1) for all nodes (non-leaf nodes) besides the root. Thus, they can have between m and [m/2] children. +- The level of each leaf node is the same. + +**Need of B-tree** + +- For having optimized searching we cannot increase a tree's height. Therefore, we want the tree to be as short as possible in height. +- Use of B-tree in DBMS, which has more branches and hence shorter height, is the solution to this problem. Access time decreases as branching and depth grow. +- Hence, use of B-tree is needed for storing data as searching and accessing time is decreased. +- The cost of accessing the disc is high when searching tables Therefore, minimising disc access is our goal. +- So to decrease time and cost, we use B-tree for storing data as it makes the Index Fast. + +**How Database B-Tree Indexing Works:** + +- When B-tree is used for database indexing, it becomes a little more complex because it has both a key and a value. The value serves as a reference to the particular data record. A payload is the collective term for the key and value. +- For index data to particular key and value, the database first constructs a unique random index or a primary key for each of the supplied records. The keys and record byte streams are then all stored on a B+ tree. The random index that is generated is used for indexing of the data. +- So this indexing helps to decrease the searching time of data. In a B-tree, all the data is stored on the leaf nodes, now for accessing a particular data index, database can make use of binary search on the leaf nodes as the data is stored in the sorted order. +- If indexing is not used, the database reads each and every records to locate the requested record and it increases time and cost for searching the records, so B-tree indexing is very efficient. + +**How Searching Happens in Indexed Database?:** + +The database does a search in the B-tree for a given key and returns the index in O(log(n)) time. The record is then obtained by running a second B+tree search in O(log(n)) time using the discovered index. So overall approx time taken for searching a record in a B-tree in DBMS Indexed databases is O(log(n)). + +**Examples of B-Tree:** + +Suppose there are some numbers that need to be stored in a database, so if we store them in a B-tree in DBMS, they will be stored in a sorted order so that the searching time can be logarithmic. + +Lets take a look at an example: + +`B+Tree` +![Screenshot 2024-02-12 at 5.10.35 PM](https://hackmd.io/_uploads/HkYCQtwo6.png) + +The above data is stored in sorted order according to the values, if we want to search for the node containing the value 48, so the following steps will be applied: + +- First, the parent node with key having data 100 is checked, as 48 is less than 100 so the left children node of 100 is checked. +- In left children, there are 3 keys, so it will check from the leftmost key as the data is stored in sorted order. +- Leftmost element is having key value as 48 which match the element to be searched, so thats how we the element we wanted to search. + +> Learn more about B+Trees here: https://www.scaler.com/topics/data-structures/b-tree-in-data-structure/ + + +--- +Cons of Indexes +--- + + +While we have seen how indexes help make the read queries faster, like everything in engineering, they also have their cons. Let's try to think of those. What are the 4 types of operations we can do on a database? Out of these, which operations may require us to do extra work because of indexes? + + +Yes, whenever we update data, we may also have to update the corresponding nodes in the index. This will require us to do extra work and thus slow down those operations. + +Also, do you think we can store index only on memory? Well technically yes, but memory is volatile. If something goes wrong, we may have to recreate complete index again. Thus, often a copy of index is also stored on disk. This also requires extra space. There are two big problems that can arise with the use of index: +1. Writes will be slower +2. Extra storage + + +Thus, it is recommended to use index if and only if you see the need for it. Don't create indexes prematurely. + +--- +Indexes on Multiple Columns +--- + +How do you decide on which columns to create an index? Let's revisit how an index works. If I create an index on the `id` column, the tree map used for storing data will allow for faster retrieval based on that column. However, a query like: + +```sql +select * from students where psp = 90.1; +``` + +will not be faster with this index. The index on `id` has no relevance to the `psp` column, and the query will perform just as slowly as before. Therefore, we need to create an index on the column that we are querying. + +We can also create index on 2 columns. Imagine a students table with columns like: + +`id | name | email | batch_id | psp |` + +We are writing a query like this: + +```sql +select * from students where name = 'Naman'; +``` + +Let's say we create an index on (id, name). + +When create index on these 2 columns, it is indexed according to id first and then if there is a tie it, will be indexed on name. So, there can be a name with different ids and we will not be able to filter it, as name is just a tie breaker here. The index is being created on first column i.e `id` and then the second column acts as a `tie breaker`. + +Thus, if we create an index on ``(id, name)``, it will actually not help us on the filter of name column. + +![Screenshot 2024-02-12 at 5.22.58 PM](https://hackmd.io/_uploads/rJynLtvip.png) + + +> Read more here: https://www.scaler.com/topics/postgresql/multi-column-index-in-postgresql/ + + +--- +Indexing on Strings +--- + + +Now let's think of a scenario. How often do we need to use a query like this: + +```sql +SELECT * FROM user WHERE email = 'abc@scaler.com'; +``` + +But this query is very slow, so we will definitely create an index on the email column. So, the map that is created behind the scenes using indexing will have email mapped to the corresponding block in the memory. + +Now, instead of creating index on whole email, we can create an index for the first part of the email (text before @) and have list of blocks (for more than one email having same first part) mapped to it. Hence, the space is saved. + + + +Typically, with string columns, index is created on prefix of the column instead of the whole column. It gives enough increase in performance. + +> Note: Explain this with example. + +Consider the query: +```sql +SELECT * FROM user +WHERE address LIKE '%ambala%'; +``` + +We can see that indexing will not help in such queries for pattern matching. In such cases, we use Full-Text Index. +> More on Full-Text Indexing: https://www.scaler.com/topics/mysql-fulltext-search/ + +--- +How to create index +--- + + +Let's look at the syntax using `film` table: +```sql +CREATE INDEX idx_film_title_release +ON film(title, release_year); +``` + + +Good practices for creating index: +1. Prefix the index name by 'idx' +2. Format for index name - idx\\\\\\... + + +Now, let's use the index in a query: +```sql +EXPLAIN ANALYZE SELECT * FROM film +WHERE title = 'Shawshank Redemption'; +``` + +`Output:` + +```sql +-> Index lookup on film using idx_title (title='Shawshank Redemption') +(cost=0.35 rows=1) (actual time=0.0383..0.0383 rows=0 loops=1) +``` + + +`Without Indexing:` + +```sql +drop index idx_title on film; + +EXPLAIN ANALYZE SELECT * FROM film +WHERE title = 'Shawshank Redemption'; +``` +`Output:` + +```sql +-> Table scan on film (cost=103 rows=1000) +(actual time=0.0773..1.01 rows=1005 loops=1)) +``` + +We can clearly see `using indexing` the numeber of rows our query have to search to find answer is very less i.e just **1 row**. Meanwhile `without Indexing` the number of rows we have to iterate is `1000` which is very expensive. + + +If you look at the log of this query, "Index lookup on film using idx_film_title_release" is printed. If we remove the index and run the above query again, we can see that the time in executing the query is different. In case where indexing is not used, it takes more time to execute and more rows are searched to find the title. + +> More on how to create indexes: https://www.scaler.com/topics/how-to-create-index-in-sql/ + +--- +Clustered and Non-clustered Index? +--- + +Nowadays, we use databases to store records, and to fetch records much more efficiently, we use indexes. The index is a unique key made up of one or more columns. There are two types of indexes: + +1. Clustered Index +2. Non-Clustered Index + +**Clustered Index:** + +A clustered index is a special type of index that rearranges the physical order of the table rows based on the index key values. This means that the data in the table is stored in the same order as the index. Clustered indexes can only be created on one column. + + +**Non-Clustered Index:** + +A non-clustered index is a type of index that does not physically reorder the table data. Instead, it creates a separate structure that contains the index key values and pointers to the corresponding rows in the table. + + +**Characteristics of the Clustered Index:** + +- **Advantages**: Faster queries for returning data in a specific order. +- **Disadvantages**: Only one clustered index can be created per table. +- **When to use**: When you need to return the data in a specific order regularly. + +The characteristics of a clustered index are as follows: + +- Default Indexing Methodology +- Can use a single or more than one column for indexing. +- Indexes are stored in the same table as actual records. + + +**Characteristics of the Non-Clustered Index:** + +- **Advantages**: Multiple non-clustered indexes can be created per table. +- **Disadvantages**: Queries that use non-clustered indexes may be slower than queries that use clustered indexes, especially for queries that need to return a large amount of data. +- **When to use**: When you need to return data in a specific order, but you don't need to do this regularly. + +The characteristics of the non-clustered index are as follows: + +- Table data is stored in the form of key-value pairs. +- Tables or Views can have indexes created for them. +- It provides secondary access to records. +- A non-clustered key-value pair and a row identifier are stored in each index row of the non-clustered index. + +**Differences between Clustered and Non-clustered Index:** + +![Screenshot 2024-02-12 at 5.31.41 PM](https://hackmd.io/_uploads/SJD2_twip.png) + +> Learn more about it here: https://www.scaler.com/topics/clustered-and-non-clustered-index/ + + + +That's all for today. Thanks! \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/10 Notes_ Transactions.md b/Non-DSA Notes/SQL Notes/10 Notes_ Transactions.md new file mode 100644 index 0000000..ec9e6bd --- /dev/null +++ b/Non-DSA Notes/SQL Notes/10 Notes_ Transactions.md @@ -0,0 +1,653 @@ + +--- +Agenda +--- + + +- What are transactions? +- Properties of transactions? + - Atomicity + - Consistency + - Isolation + - Durability +- How do transactions work? + - Read + - Write +- Commits and Rollbacks + + +We will try to cover most of these topics here and the remaining in the next part of Transactions. + +It is going to be a bit challenging, advanced, but very interesting topic that is asked very frequently in interviews, such as ACID, lost updates, dirty and phantom reads, etc. + +So let's start. + +--- +What are transactions? +--- + +Till now, we have written a query. We executed it, and it worked. +Sometimes, instead of writing one simple query, we might need multiple queries to execute together. + +For example - **Transfering Money in a Bank** + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/974/original/upload_845348751cdfa3f32d5bf8eef6b5d727.png?1690721321) + +Now, to transfer the money from A to B, what all changes are needed in the database of the back? + +**[Activity:]** +Where do you think the information regarding the balance etc., of a customer is stored in the bank? in which table? + +--> Probably a table called `account` +with a schema such as: + +`accounts` + + +| id | name | balance |created_at |branch |... | +| -------- | -------- | -------- |-------- |-------- |-------- | +| 1 | A | 1000 | - |-|-| +| 2 | B | 5000 | - |-|-| + +Where - represents some value (not relevant to our example). + +Now, let's see what all things need to happen in this `accounts` table for the transaction to complete. + +**[Time to think:]** +Can the first step be "reduce this much money" from A? +**No** +because what if A doesn't even have that much money to transfer? + +The steps will be as follows: +1. Get the balance of A. (DB call) +3. Check if the balance >= 500 (no DB call) +4. Reduce the balance of A by 500. (DB call) +5. Increase the balance of B by 500. (DB call) + +Now, to do this, we will probably have a function as follows in our application code: + +``` +transfer_money(from, to, amount) { + // multiple SQL queries with conditionals +} +``` + +Will that be enough? +Let's think about what could go wrong using such a function call. + +**[Let us think:]** +Do you think, at one time, only one person would be calling the `transfer_money()` function? + +--> **No** + +There might be a situation where `transfer_money() may be getting executed by multiple people at the same time`: +* A --> B (read --> as "transfering money to") +* C --> D +* E --> B +* D --> A + +Let's understand what can go wrong in such a situation with an example. + +Let's say two people (A and C) are transferring money to the same person (say, B) at the same time. + +- A --> B +- C --> B + +Such a situation can happen in the case of **Amazon** where multiple people are sending payment to the same merchant at the same time. + +Now before continuing with our example, let's first ask ourselves - is updating the balance of B a single operation? + +`B += 500` --> `B = B + 500` + +Which behind the scenes converts to: +- read the current value of B in a temporary variable `temp`. +- Increase the value of `temp` by 500. +- Write the value in `temp` back to `B`. + +Even though it looked to us as 1 update statement, behind the scene, it was multiple (read and write). + +Let's write it as a pseudocode (Note to instructor - write non DB statements differently than DB statements) - +``` +transfer_money(a, b, amount) { + read A -> x + if (x >= amount) { + write A <- x - amount + read B -> x + x = x + amount + write B <- x + } +} +``` + +To be clear here, we can't update the value of B directly because any calculation happens in the CPU and before updating, the CPU must first get teh value from memory/disk. + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/975/original/upload_384919579d89f8132e76835036024441.png?1690721345) + +### What could go wrong - 1 + +Now back to the topic. What could go wrong in our example? + +- A --> B +- C --> B + +Initially: + +- A = ₹1000 +- B = ₹5000 +- C = ₹15000 + +**[Time to think:]** +Two people are performing the same operations at the same time, but at the end, can two things be written on the memory at the same time? + +**No**, one will happen just before the other. + +Let's run through our function line by line assuming that A --> B (represented in orange) transfer is running one step ahead of C --> B (represented in blue). + + + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/976/original/upload_ebdc7a13a3fd6a9b91a8450bb794f6fd.png?1690721396) + +> Note: Here, we are running these queries concurrenlty/Parallely step by step. + +**[Time to think:]** + +- Before transactions, what was the total money in the bank? + - ₹21000 +- After both transactions are done, what's the total money in the bank? + - ₹20500 + +**What happened here?** + +Money went missing. + +**The A --> B transaction made the balance in A 500 from 1000, and 5500 in B from 5000, but before this transaction ended, the C --> B read the value of B as 5000 instead of 5500 and made it 15000.** + +So, do you see, when one particular operation requires us to do multiple things, it can actually lead to wrong answers unless we take care. + +## What could go wrong - 2 + +What else could go wrong? + +Let's say the DB machine goes down after the `Write A <- X` statement. +What will happen? + +**Money is gone from one person, but the other person never received it.** + +### What is a transaction? + +We saw the following two problems that can happen when executing multiple queries as part of a single operation. + +1. Inconsistent/ Illogical State +2. Complete operation may not execute. + +These are the problems a **transaction** tries to solve. + +**Transaction:** A set of database operations logically grouped together to perform a task. + +Example - Transfer Money + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/977/original/upload_12577f22d79867c787bcb65ed8cd16ac.png?1690721432) + +What do I mean by **"Logically Grouped Together"**? + +It means they are grouped together to achieve a certain outcome, like transferring money. + +When you have to perform such a task, you say, "I am starting a transaction", after that you perform the queries and then you "finish the transaction". + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/978/original/upload_d452c583f3db25ea62ac7318f682387d.png?1690721455) + +**Basically, a transaction is a way to tell the DB that all these following operations are part of doing the same thing.** + + +Now that we know what a transaction is. We will be discussing what are some properties a transaction guarantees us so that things that can go wrong don't actually go wrong. + + +--- +ACID Properties +--- + +Now, let's discuss what are the expectations from a transaction. + +Most of the time, when you are reading a book, they will talk about the "properties" of a transaction but we have used the word "expectations" for a reason. + +**[Time to think:]** +How many of you have heard about ACID properties? + +**[Time to think:]** +How many of you think ACID properties **must** always be there in a transaction? + +**So here's the thing - ACID properties may or may not be there in a transaction. It depends on our usecase.** + +**ACID properties** are 4 expectations that an engineer might have when they create a transaction or execute a group of SQL queries together. + +They are: + +- **A**tomicity +- **C**onsistency +- **I**solation +- **D**urability + +Now let's discuss these expectations one by one. + +--- +ACID Properties - Atomicity +--- + +## Atomicity + +**[Time to think:]** +- The word atomicity is formed out of which word? + **Atom** + which means the smallest **indivisible** unit (at least for a long time before the discovery of sub-atomic particles) + + Now how does it relate to Databases? + + For us, atomicity does not mean the smallest unit, it means the indivisible unit. + +**The property of atomicity means the transaction should appear atomic or indivisible to an end user.** + +**Analogy** + +Take yourself back to your childhood. When you were a kid, did "transfering money" mean checking balances, reducing from one account and adding to another? + +No. + +It was a simple give-and-take for us. + +This is how a transaction should appear to an end user. + +- **To an outsider, it should feel that either nothing has happened or everything has happened.** +- This means that a transaction should never end in an intermidiary state. + +> Activity: Go back to the `transfer_money()` function example and observe the opening and closing curly braces. They ensures Atomicity and keeps overall functionality of a Transaction under a single hood. + +**Usecase** + +The property prevents cases where money gets deducted from one account but didn't credit to another account. + +**[Time to think:]** + +- Can you think of a usecase where atomicity may not be required? + - **Google Profile Picture Update**: It may happen that after updating the profile picture, it is updated in Gmail but not in youtube and may take up to 48 hours to update accross different services. + +--- +ACID Properties - Consistency +--- + +Consistency means: +- Correctness +- Exactness +- Accuracy +- Logical correctness. + +What do we mean by this? + +**[Time to think:]** +- In our original example - was atomicity handled for both accounts A and C? + - Yes. it was. + - Money transfer was completed for both of them. The transfer was not left in between. + +Now, consistency comes into play when you start a work, and you complete it **but** you don't do it in the correct or accurate way. + +**[Time to think:]** + +- In our original example - `transfer_money`, both A and C completed, but was the outcome correct? + - No. + - The money was somehow lost. + - It leads the DB to an inaccurate state. + +In the bank example, consistency is important, but is it always the case? + +Let's take the example of **Hotstar**. +Let's say we a `live_streaming` table. + + +| stream_id| count | +| -------- | -------- | +| 1 | 160000000 | +| ... | ... | + +Let's say a particular stream with 1.6 crore people watching. +Now, 2 people at the exact same time start watching the stream too. + +So for both of them, how would the SQL query look like? + +1. Get the current count of viewers. +2. Update count. + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/979/original/upload_e82db6201060cb4f973db19e297d95d5.png?1690721487) + +Both update the count to 160000001 but is that a critical situation? +- Will the world go gaga over this? + +It doesn't matter whether it's 160000001 or 160000002 if it is improving the performance. + +**So, consistency is something that might be expected but not always. It depends on the use-case.** + +--- +ACID Properties - Isolation +--- + + +Now, let's discuss the third one. That is isolation. + +**Isolation means one transaction shouldn't affect another transaction running at the same time on the same DB in the wrong way.** + +**[Time to think:]** +- In the `transfer_money` example if there was just one transaction happening would there have been any problem? + - No. + - The things went wrong because there was another transaction that started happening on the same person. + - The two operations were not completely separate. + +So what does the word Isolated mean? +It means **separate**. That is not aware or completely apart from each other. + +In our example, the transactions were interfering with each other. + +The inconsistency that happened was actually because of the isolation not being present. + +We need to understand that two transactions can happen at the same time but it is important to keep them isolated and not let them interfere with each other. + +In the hotstart example, if there was no isolation, would that have been okay? - Yes. + +There are multiple levels of isolation that we will be discussing later. + + +--- +ACID Properties - Durability +--- + +Now, let's discuss the last one. Durability. + +**Durability is once a transaction is completed. We would want its work to stay persistent.** + +It should not happen that you make a money transfer today and it is completed, but tomorrow that transfer is no longer present in the history logs. + +*This means that whenever a transaction is complete, whatever the result was should be stored on a disk.* + +- Saving on a disk takes time. +- By default, every DB may not update the disk. + +It depends on our use-case and how we have configured the DB. + +There are DBs that give us fast writes. How? By just updating in-memory instead of the disk. + +- Would they have durability? No. +- Will they be fast for writes? Yes. + +**[Time to think:]** +- Can a bank use such a DB that provides fast write by avoiding writing on disk? + - No. + - It is important for a bank to ensure the changes are persistent on a disk. + +"Everything in engineering comes with a tradeoff and we should not assume that a DB provides something without properly checking and understanding how it works." + +Now, durability has different levels: +- What if disk gets corrupted after saving? +- Then we might need to store on multiple disks. +- What if there is a flood in the data center? +- Then we might need to store in multiple data centers +- and so on... + +**Theoretically, we can't have 100% durability.** +How much durability we should have depends on the use-case. + + +--- +ACID Properties - Summarized +--- + +Let's summarize the ACID properties: +- Atomicity - Everything happens or nothing happens. +- Consistency - Whatever happens should happen in the correct way. +- Isolation - One transaction should not interfere with another transaction, consequently giving a wrong outcome. +- Durability - Once the outcome has been achieved, the outcome should persist. + + + + +--- +Commits and Rollbacks +--- + +Now, we will discuss the practical part. How to actually use the transactions in the DB. + +Let's say you have a students table: + + +| id | name | psp | batch | +| -------- | -------- | -------- |-------- | +| 1 | Naman | 70 |2 | +| ... | ... | ... |... | + +If I write a SQL query: + +``` +UPDATE students +SET psp = 80 +WHERE id = 1; + +SELECT * FROM students; +``` + +**[Time to think:]** +- What are you expecting the psp of id 1 to be? + - 80. + - We are saying 80 assuming that the query got written to the DB. + +When I write a SQL query, it automatically gets updated to the DB. Why is that? + +That happens because of something called **auto commit**. + +**Whenever you write a simple SQL query, it starts a transaction, after that it executes itself and then save the changes to DB automatically.** + +We only have to start a transaction when it happens for multiple queries. + +Here "saves the changes" means "commit". +Committing is similar to a promise or a guarantee. + +**Commit is a keyword/clause/statement to persist the results of a SQL query.** + +**[Time to think:]** +- Which property of ACID is taken care of by `commit`? + - Durability. + +By default, in MySQL and most of the SQL DBs, `autocommit` is set to true. But we can keep it off based on the use-case. + +**Let's see an actual example in the MySQL visualizer**. + +- Create a file `transact.sql`. +- Add the first command and execute it: + ``` + set autocommit = 0; + ``` +- Now, let's open a table. + ``` + SELECT * FROM film + WHERE film_id = 10; + ``` +- Which film do you see? + - "Aladdin Calendar" +- Now let's update it: + ``` + UPDATE film + SET title = "Rahul" + WHERE film_id = 10; + ``` + +**[Time to think:]** +- Now, if we open a new session and try to execute the following statement: + ``` + SELECT * FROM film + WHERE film_id = 10; + ``` + What will we see? + - We will still see "Aladdin Calendar" and not "Rahul" because we haven't committed it. + +Other people or different sessions won't be able to see the changes we made until we commit the changes manually. So how to do that? + +- Write and execute the following query: +``` +commit; +``` + +Now if we try to check again in a different session, it will be the updated value of "Rahul" in the title attribute. + +> Activity: Run the above example again with `autocommit` set to 1. + + +**[Time to think:]** +- Why is commit needed seperately, and why we can't always have an autocommit? + - Because it depends on use-case. + - There may be a situation where we do a few changes but then realize that we don't want to commit them. + +**For example** - You were doing the money transfer from A to B, and after deducting the money from A you realized that the transfer may be spam or suspicious. In this case, we will have to undo or revert the changes done till that point. + +- Now how you can do that? + - By using another keyword/clause/statement called **Rollback**. + +As commit allows us to save the changes, rollback allows us to revert the changes. + +**Rollback allows us to revert the changes since last commit**. + +Let's see an example: + +- Run the following statements: + ``` + set autocommit = 0; + SELECT * FROM film + WHERE film_id = 10; + ``` +- Currently, the title is "Rahul". Let's change it to "Rahul2". + ``` + UPDATE film + SET title = "Rahul2" + WHERE film_id = 10; + ``` +- Now if we read in the same session the title will be "Rahul2". But if we rollback instead of commit. It will go back to "Rahul". + ``` + rollback; + ``` + + + +Summarize: +- Commit: like a marriage | persist the changes to DB. +- Rollback: like a break up | undo the changes done since the last commit. + + +--- +Transaction Isolation Level +--- + +Did you notice when we were doing changes in session 1, those changes were not yet visible in session 2. + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/980/original/upload_f8c71607d1ee9e321dc710db49b855b1.png?1690721525) + +**[Time to think:]** +- Was it only because the changes were not yet commited? + +That was part of the reason but there are other reasons as well. + +**This is Isolation** + +By default, MySQL keeps both of the sessions isolated. +- One session see a snapshot of the table that the other doesn't. + +MySQL supports 4 levels of isolation: +1. Read Uncommitted (RU). +2. Read Committed (RC). +3. Read Repeatable (RR). +4. Serializable (S). + +Here, we will discuss the first one and the remaining three in the next notes. + +The isolations levels are written in order of increasing severity. + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/981/original/upload_4baae922df529ebe6f82441a70ab4bcc.png?1690721564) + +Note that when we talk about isolation, we are talking about the restrictions on reading the data and not updating it. + +--- +Transaction Isolation Level - Read Uncommitted +--- + + +- It allows a transaction to read even uncommitted data from another transaction. +- Reads the latest data (committed or uncommitted). + +Assume there are 3 different sessions: + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/982/original/upload_c9658410d81a8322cc4f10b1466b111d.png?1690721584) + +- Isolation level only takes about how your session will work. +- The isolation level of other transactions doesn't matter to you. + +Now to understand this, let us see the example. +- To read the current level, execute the following statement: + ``` + SHOW variables LIKE "transaction_isolation"; + ``` + +- It will output the following: + | variable_name | value | + | -------- | -------- | + | transaction_isolation | REPEATABLE_READ | +- Let's change it in session 2: + ``` + SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITED; + ``` +- Now, when we try to read it, it will be "READ-UNCOMMITED". (For Session 1 it will remain RR). + +Now, let's see another example. + +- Run the following statements in session 1: + ``` + set autocommit = 0; + SELECT * FROM film + WHERE film_id = 10; + ``` + - The title is "Naman2" +- Let's change it to "Naman3" + ``` + UPDATE film + SET title = "Rahul3" + WHERE film_id = 10; + ``` + +Now, earlier when we were trying to read this in session 2, it was not showing the updated title. +**But** now if we execute the following in session 2: +``` +SELECT * FROM film +WHERE film_id = 10; +``` +I will see "Rahul3". +**Because it is the latest uncommitted value.** + +**Pros** +- Fast + +**Cons** +- Read uncommitted may lead to inconsistency. + +Let's understand the con with an example. + + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/983/original/upload_612d7c72cfd7baa6eb4e9d454eff4506.png?1690721613) + +In the above example, we ran into a consistency problem because at line 8 in transaction 2, it read the data which was not committed and later rolled back. + +### Dirty Read + +This kind of problem is called a dirty read. + +**Dirty Read**: When a transaction ends up reading data which may not be committed. + + + + + + + + + diff --git a/Non-DSA Notes/SQL Notes/11 Notes Transaction 2.md b/Non-DSA Notes/SQL Notes/11 Notes Transaction 2.md new file mode 100644 index 0000000..aa66538 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/11 Notes Transaction 2.md @@ -0,0 +1,409 @@ + +--- +Agenda +--- + +- Read Uncommitted +- Read Committed +- Repeatable Reads +- Serializable +- Deadlocks + + +> Quick Recap: + +## Transaction Isolation Level - Read Uncommitted + +- It allows a transaction to read even uncommitted data from another transaction. +- Reads the latest data (committed or uncommitted). + +Assume there are 3 different sessions: + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/982/original/upload_c9658410d81a8322cc4f10b1466b111d.png?1690721584) + +- Isolation level only takes about how your session will work. +- The isolation level of other transactions doesn't matter to you. + +Now to understand this, let us see the example. +- To read the current level, execute the following statement: + ``` + SHOW variables LIKE "transaction_isolation"; + ``` + +- It will output the following: + | variable_name | value | + | -------- | -------- | + | transaction_isolation | REPEATABLE_READ | +- Let's change it in session 2: + ``` + SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITED; + ``` +- Now, when we try to read it, it will be "READ-UNCOMMITED". (For Session 1 it will remain RR). + +Now, let's see another example. + +- Run the following statements in session 1: + ``` + set autocommit = 0; + SELECT * FROM film + WHERE film_id = 10; + ``` + - The title is "Rahul2" +- Let's change it to "Rahul3" + ``` + UPDATE film + SET title = "Rahul3" + WHERE film_id = 10; + ``` + +Now, earlier when we were trying to read this in session 2, it was not showing the updated title. +**But** now if we execute the following in session 2: +``` +SELECT * FROM film +WHERE film_id = 10; +``` +We will see "Rahul3". +**Because it is the latest uncommitted value.** + +**Pros** +- Fast + +**Cons** +- Read uncommitted may lead to inconsistency. + +Let's understand the con with an example. + +![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/040/983/original/upload_612d7c72cfd7baa6eb4e9d454eff4506.png?1690721613) + +In the above example, we ran into a consistency problem because at line 8 in transaction 2, it read the data which was not committed and later rolled back. + +### Dirty Read + +This kind of problem is called a dirty read. + +**Dirty Read**: When a transaction ends up reading data which may not be committed. + + +--- +Read Committed +--- + + +We discussed about dirty read. +- Dirty Read: Reading some data that is not confirmed yet. +- A data is confirmed once it is committed. + +Now, how do you ensure only committed data is read? +By using **Read Committed** isolation level. + +**Read Committed:** Your transaction will only read the latest committed data. + +Let's understand with an example: + +Start session 1 and execute the following queries: +``` +SET autocommit = 0; +SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; +SELECT FROM * film +WHERE film_id = 10; -- This will return "Rahul3" +UPDATE film +SET title = "Rahul4" +WHERE film_id = 10; + +-- Changes not committed yet +``` + +Now **start session 2** and execute the following queries: + +``` +SET autocommit = 0; +SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; +SELECT * FROM film +WHERE film_id = 10; +``` + +Now, what do you think will be the output? + +It will be "Rahul3". +Why not "Rahul4"? + +Because the transaction isolation level of the session matters and not the transaction isolation level of the session that updated the value. + +Now what if we execute the following query in session 2: + +``` +UPDATE film +SET title = "Rahul20" +WHERE film_id = 10; +``` + +> Note: the above query will keep loading. + +Now do you see some problem is happening? Why is it waiting? + +**This is because of something called "locking".** + +As soon as we commit the first transaction, the query in session 2 will execute. + +There are two types of locks: +- Shared lock (Caused by a read query) +- Exclusive lock (Caused by a write query) + + + +| Someone has | Someone else can | +| -------- | -------- | +| shared lock | read and write the same row (explain with example) | +| exclusive lock | read the same row but can't write (explain with example) | + +**If one transaction has written on a row, no other transaction will be allowed to write on that row.** + +> Note: We will discuss locks again in the later part of the notes again. + + +**[Time to think:]** +Now do you understand why performance starts getting affected when you do RC instead of RU? + +Because when something is being read or written in RC, a lot of locks and concurrency controls starts happening. + +**Summarizing the read committed isolation level:** +- A transaction will read always the latest committed value. +- It will never read an uncommitted value. +- That means Dirty read will not happen. + + +**[Time to think:]** +Is this the best solution? Will there be no problems in this? + +Let's discuss some problems with read committed. + +### Problems with read committed. +Let's understand the problem of read committed through an example: + +Suppose you have a *users* table: + + +| id | name | psp | email-sent | +| -------- | -------- | --------|------- | +|1 | N | 60 | false | +|2 | A| 80 | false | +|3 | M | 70 | false | +|4 | AB | 45 | false | + +Suppose for all students having psp less than 80, I want to send an email to them. + +**[Time to think:]** +How the flow of above task will look like ?? + +A structured flow of above task will look like this: + +1. Get list of all users having psp<80 +` list = SELECT * FROM users WHERE psp<80` +2. Send email to the obtained list of users (Let's assume this operation took 5 minutes) +3. Update the *"email-sent"* column for users to whom the mail has been sent. +``` + UPDATE users + SET email_sent = true + WHERE psp < 80; +``` + +**[Time to think:]** +What can probably go wrong here ? +What if someone's psp value get changed in between this process? + +Let's clear this by discussing specific user **(User 2)** of above example: + +**Step 1:** Getting list of users having psp<80 *(User with id = 2 will be definietly **not** a part of this list)* +**Step 2:** Sending email to the users of obtained list *(Since User 2 is not in the list, they will not receive the email.)* +**Before Step 3:** *Let's assume due to some reason the psp value of user 2 got changed to 79.Now, the psp value of User 2 is less than 80.* +**Step 3:** While updating the "email_sent" column through the below query: +``` +UPDATE users +SET email_sent = true +WHERE psp < 80; +``` + +it will incorrectly set that User-2 has also received the mail since the user's current PSP is less than 80. However, here we encounter an issue because in reality, User-2 didn't receive the mail. **This is called as non-repeatable read which occurs when a transaction reads the same row twice and gets a different value each time** + +* **What should be ideal case here ?** +Within a transaction If I read the same row again it must have the same value. + +***The main problem here is:*** +In the read-committed isolation level, the transaction will always read the latest committed value, even if the latest commit is done after the transaction has started. + + + +**[Time to think:]** +How can we solve this problem of non-repeatable read ? + +By using **Repeatable Reads** isolation level. + +--- +Repeatable Reads +--- + +**How does Repeatable Reads work?** + +In the repeatable reads isolation level, whenever it has to read a row: + +* For the very first time, it reads the latest committed value of the current row. +* After that, until the transaction completes, it keeps on reading the same value it read the first time. + + +Let's understand this with an example: + +Start session 1 and execute the following queries: +``` +SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ +BEGIN; +SELECT * film where film_id = 13 --> This will return a row of film having name ALI FOREVER +``` + +Now, start session 2 and execute the following queries: + +``` +SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED +START TRANSACTION; +UPDATE film SET title = "Deepak" WHERE film_id = 13 +COMMIT; -->The below mentioned detail will work same with or without this +``` + +**[Time to think:]** +Now, if you try to run the query from the first session again, what will be the output this time? + +The output will still be the same: `ALI FOREVER`, as in this session-1, `ISOLATION LEVEL REPEATABLE READ` is set, and for further reads, it keeps on reading the same value it read the first time until this session's transaction completes. + +**NOTE:** After committing in transaction 1, if you try to read the value again from session-1, it will show film_name as "Deepak". + +**Summary:** + +* Dirty Read: This problem was solved by Read Committed +* Non-Repeatable Read: This problem was solved by Repeatable Read. + +**[Time to think:]** +Is repeatable read best ? +**Ques:** Repeatable read keeps snapshot of which rows ?? +Repeatable read doesn't only keep snapshot of the row that you read but also take snaps of other set of nearby rows. + +Let's get this point with an example: + +**Session-1** +``` +SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED +START TRANSACTION; +SELECT * film WHERE film_id = 12; --> Initial film_name: Naman +SELECT * film WHERE film_id = 13; --> Initial film_name: Deepak +``` + +Now in **Session-2** + + +``` +SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ +BEGIN; +SELECT * film where film_id = 12 --> Initial film_name = Naman +(Till this moment name of film at film_id = 13 is still Deepak and snap with Deepak is taken) +``` + +Let's change film Deepak into Aniket in **session-1** + +``` +SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED +START TRANSACTION; +SELECT * film WHERE film_id = 12; --> Initial film_name: Naman +SELECT * film WHERE film_id = 13; --> Initial film_name: Deepak + +UPDATE film SET title = "Aniket" WHERE film_id = 13; +COMMIT; +``` + +Again read film_name at film_id = 13 in **session-2** + +``` +SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ +BEGIN; +SELECT * film where film_id = 12; +SELECT * film where film_id = 13; --> film_name is still "Deepak" +``` + +Even if the value for `film_id = 13` is read the first time, after the update, the latest committed value of this row is not taken into consideration because when initially `SELECT * FROM film WHERE film_id = 12;` was executed, snapshots of a few more rows were already taken. + +> **NOTE:** Instead of doing these changes for film_id = 13 if we had done this for some other film_id we might have got updated value as output. + +**The new problem:** +During this process, if someone adds a new row, its snapshot was obviously not taken before. This can create a problem known as a **phantom read.** + + + +**Phantom Read:** A row comes magically that you have never read earlier. + +A phantom read occurs when a transaction reads a set of rows that match a certain condition, but another transaction adds or removes rows that meet that condition before the first transaction is completed. As a result, when the first transaction reads the data again, it sees a different set of rows, as if "phantoms" appeared or disappeared. + +**How to solve Phantom Read problem ?** +By Using concept of Serializable + +--- +Serializable +--- + +This technique ensures that multiple transactions or processes are executed in a specific order, one after the other, even if they are requested simultaneously. + + +> Activity: Let's understand the difference between the types of isolation with the **"Show Tickets" and "Book Tickets"** functionalities on the BookMyShow website by showing a similar booking in two different tabs. +Show Ticket: In the "Show Tickets" functionality, both users can select the same seat. +Book Ticket: In the "Book Tickets" functionality, only one person can proceed to book one ticket. + +**[Time to think:]** +What will you read, or do you even read from any row? Will it depend on your isolation level or others' isolation level? + +It depends on your isolation level. + +In serialization, you are only allowed to read rows that are not locked. In other isolation levels, you can read any row without checking for allowed rows. In another isolation level, which version of the row you'll read was the problem. + +# Example of Isolation Level in Bank +1. Set isolation level serialisable in two different sessions (In a bank transfer between 2 users if both are not having same isolation level, it doesn't make proper sense) +2. Perform following query in **session-1** +``` +SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE +START TRANSACTION +SELECT * FROM film WHERE film_id in(12,13) FOR UPDATE; +``` +3. Perform following query in **session-2** +``` +SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE +START TRANSACTION +SELECT * FROM film WHERE film_id in(13,14) FOR UPDATE; +``` +> Note: The above query will keep loading, as lock is already taken on row 13 so even read is not allowed now +> If FOR UPDATE wasn't used in session-1, then it should be allowed to read in session-2 + +Thus, serialization makes the concept of one person at a time, avoiding concurrency and maintaining consistency with complete isolation. + + + +--- +Deadlock +--- + +In transaction when we write or we read for updates, do we take a lock on networks ?? +Yes. + +But this lock can lead to deadlock condition. This problem happens when two people are waiting on resources hold by each other and not able to make a progress. + +**Example:** +1. Suppose there are two users, T1 and T2, each requiring access to resources A and B. +![deadlock](https://hackmd.io/_uploads/SkvFLXsia.png) +2. User T1 has acquired a lock on resource A and is trying to obtain a lock on resource B to accomplish their action. +3. At the same time, User T2 has acquired a lock on resource B and is trying to obtain a lock on resource A to accomplish their task. +4. Both are waiting for each other to release their occupied lock. +5. T1 is waiting for T2 to complete and T2 is waiting for T1 to complete and no one is making progress. This condition is know as deadlock. + + +**How we handle Deadlock in SQL:** It automatically rollsback one of the transaction. + +**Way to avoid Deadlock:** take locks in defined ordered way + + + +> **More on deadlocks:** [We'll discuss more about deadlock in further classes of next module.](https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-deadlocks-guide?view=sql-server-ver16) +> \ No newline at end of file diff --git a/Non-DSA Notes/SQL Notes/12 Notes Schema Design 1.md b/Non-DSA Notes/SQL Notes/12 Notes Schema Design 1.md new file mode 100644 index 0000000..87c6fb2 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/12 Notes Schema Design 1.md @@ -0,0 +1,335 @@ + +--- +Agenda +--- + + +- What is Schema Design +- How to approach Schema Design +- Cardinality + - How to find cardinality in relations + - How to represent different cardinalities +- Sparse Relations +- Nuances when representing relations + +--- +What is Schema Design +--- + +Let's understand what Schema is. Schema refers to the structure of the database. Broadly speaking, schema gives information about the following: +- Structure of a database +- Tables in a database +- Columns in a table +- Primary Key +- Foreign Key +- Index +- Pictorial representation of how the DB is structured. + +In general, 'Design' refers to the pictorial reference for solving how should something be formed considering the constraints. Prototyping, blueprinting or a plan, or structuring how something should exist is called Design. + +Before any table or database is created for a software, a design document is formed consisting: +- Schema +- Class Diagram +- Architectural Diagram + + +--- +How to approach Schema Design +--- + + +Let's learn about this using a familiar example. You are asked to build a software for Scaler which can handle some base requirements. + +The requirements are as follows: +1. Scaler will have multiple batches. +2. For each batch, we need to store the name, start month and current instructor. +3. Each batch of Scaler will have multiple students. +4. Each batch has multiple classes. +5. For each class, store the name, date and time, instructor of the class. +6. For every student, we store their name, graduation year, University name, email, phone number. +7. Every student has a buddy, who is also a student. +8. A student may move from one batch to another. +9. For each batch a student moves to, the date of starting is stored. +10. Every student has a mentor. +11. For every mentor, we store their name and current company name. +12. Store information about all mentor sessions (time, duration, student, mentor, student rating, mentor rating). +13. For every batch, store if it is an Academy-batch or a DSML-batch. + +Representation of schema doesn't matter. What matters is that you have all the tables needed to satisfy the requirements. Considering above requirements, how will you design a schema? Let's see the steps involved in creating the schema design. + +Steps: +1. **Create the tables:** For this we need to identify the tables needed. To identify the tables, + - Find all the nouns that are present in requirements. + - For each noun, ask if you need to store data about that entity in your DB. + - If yes, create the table; otherwise, move ahead. + + Here, such nouns are batches, instructors (if we just need to store instructor name then it will be a column in batches table. But if we need to store information about instructor then we need to make a separate table), students, classes, mentor, mentor session. + + Note that, a good convention about names: +Name of a table should be plural, because it is storing multiple values. Eg. 'mentor_sessions'. Name of a column is plural and in snake-case. + +2. **Add primary key (id) and all the attributes** about that entity in all the tables created above. + + Expectation with the primary key is that: + - It should rarely change. Because indexing is done on PK and the data on disk is sorted according to PK. Hence, these are updated with every change in primary key. + - It should ideally be a datatype which is easy to sort and has smaller size. Have a separate integer/big integer column called 'id' as a primary key. For eg. twitter's algorithm ([Snowflake](https://blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake)) to generate the key (id) for every tweet. + - A good convention to name keys is \\id. For example, 'batch_id'. + + Now, for writing attributes of each table, just see which attributes are of that entity itself. For `batches`, coulmns will be `name`, `start_month`. `current_instructor` will not be a column as we don't just want to store the name of current instructor but their details as well. So, it is not just one attribute, there will be a relation between `batches` and `instructors` table for this. So we will get these tables: + +`batches` +| batch_id | name | start_month | +|----------|------|-------------| + +`instructors` +| instructor_id | name | email | avg_rating | +|---------------|------|-------|------------| + +`students` +| student_id | name | email | phone_number | grad_year | univ_name | +|------------|------|-------|--------------|-----------|-----------| + +`classes` +| class_id | name | schedule_time | +|----------|------|---------------| + +`mentors` +| mentor_id | name | company_name | +|-----------|------|--------------| + +`mentor_sessions` +| mentor_session_id | time | duration | student_rating | mentor_rating | +|-------------------|------|----------|----------------|---------------| + +3. **Representing relations:** For understanding this step, we need to look into cardinality. + + +--- +Cardinality +--- + + +When two entities are related to each other, there is a questions: how many of one are related to how many of the other. + +For example, for two tables students and batches, cardinality represents how many students are related to how many batches and vice versa. + +- 1:1 cardinality means 1 student belongs to only 1 batch and 1 batch has only 1 students. +- 1:m cardinality means 1 student can belong to multiple batches and 1 batch has only 1 student. +- m:1 cardinality means 1 student belongs to only 1 batch and 1 batch can have multiple students. +- m:m cardinality means multiple students can belong to multiple batches, and vice versa. + +In cardinality, `1` means an entity can be associated to 1 instance at max, [0, 1]. `m` means an entity can be associated with zero or more instances, [0, 1, 2, ... inf]. + +--- +Quiz 1 +--- + +What is the cardinality between two users on a social networking site like facebook? + +### Choices + +- [ ] 1:1 +- [ ] 1:m +- [ ] m:1 +- [ ] m:m + + +**Now, What do you think the cardinality be of a boy and a girl?** + +--- +How to find cardinality in relations +--- + + +Between two entites, there can be different relations. For each relation, there are different cardinalities. + +For finding cardinality, identify the two entities you want to find cardinality for. Identify the desired relation between them. + +A trick to find cardinality: +**Example 1:** Let's say we want to find the cardinality between two users on a social networking site. Put them alongside each other and mention the relation between them. + +| user1 | --- friend --- | user2 | +| ----- | -------------- | ----- | +| 1 | --> | m | +| m | <-- | 1 | + + + +From left to right, 1 user can have how many friends? 1 -> m +From right to left, 1 user can be a friend of how many users? m <- 1 + +If there is `m` on any side, put `m` in final cardinality. Therefore, here the cardinality will be m:m. This trick is based on the concept that, at max how many associations an entity can have. + +**Example 2:** What is the cardinality between ticket and seat in apps like bookMyShow? + +| ticket | --- books --- | seat | +| ------ | ------------- | ---- | +| 1 | --> | m | +| 1 | <-- | 1 | + +In one ticket, we can book multiple seats, therefore, 1 -> m +One seat can be booked in only 1 ticket, therefore, 1 <- 1 + +So, the final cardinality between ticket and seat is 1:m. + +**Example 3:** Consider a monogamous community. What is the cardinality between husband and wife? + +| husband | --- married to --- | wife | +| ------- | ------------------ | ---- | +| 1 | --> | 1 | +| 1 | <-- | 1 | + +In a monogamous community, 1 man is married to 1 woman and vice versa. Hence, the cardinality is 1:1. + +**Example 4:** What is the cardinality between class and current instructor at Scaler? + +| class | --- assigned to --- | instructor | +| ----- | ------------------- | ---------- | +| 1 | --> | 1 | +| m | <-- | 1 | + +One class can have 1 instructor, therefore, 1 -> 1 +One instructor can teach multiple classes, therefore, m <- 1 + +So, the final cardinality between class and instructor is m:1. + + +--- +Quiz 2 +--- + +In a university system, each student can attend various courses during their academic tenure. Simultaneously, courses can be taken by different students every semester. What's the relationship between `Student` and `Course` in terms of cardinality? + +### Choices + +- [ ] One-to-One +- [ ] One-to-Many +- [ ] Many-to-One +- [ ] Many-to-Many + +--- +Quiz 3 +--- + + +Considering an e-commerce platform, when a customer places an order, it may contain several products. Many people order popular products. Can you identify the cardinality of the `Order` to `Product` relationship? + +### Choices + +- [ ] One-to-One +- [ ] One-to-Many +- [ ] Many-to-One +- [ ] Many-to-Many + +--- +Quiz 4 +--- + + +In a banking application, a customer might have several accounts (like savings, checking, etc.) but each of these accounts can only be owned by a single customer. What type of cardinality does the `Customer` to `Account` relationship exhibit? + +### Choices + +- [ ] One-to-One +- [ ] One-to-Many +- [ ] Many-to-One +- [ ] Many-to-Many + +--- +Quiz 5 +--- + +In an educational institution, a student opts for a major subject. This subject might be the choice of several students, but a student cannot major in more than one subject. How would you describe the cardinality between `Student` and `Major`? + +### Choices + +- [ ] One-to-One +- [ ] One-to-Many +- [ ] Many-to-One +- [ ] Many-to-Many + + +--- +How to represent different cardinalities +--- + + +When we have a 1:1 cardinality, the `id` column of any one relation can be used as an attribute in another relation. It is not suggested to include the both the respective `id` column of the two relations in each other because it may cause update anomaly in future transactions. + +For 1:m and m:1 cardinalities, the `id` column of `1` side relation is included as an attribute in `m` side relation. + +For m:m cardinalities, create a new table called a **mapping table** or **lookup table** which stores the ids of both tables according to their associations. + +For example, for tables `orders` and `products` in previous quiz have m:m cardinality. So, we will create a new table `orders_products` to accomodate the relation between order ids and products ids. + +`orders_products` +| order_id | product_id | +| -------- | ---------- | +| 1 | 1 | +| 1 | 2 | +| 1 | 3 | +| 2 | 2 | +| 2 | 4 | +| 3 | 1 | +| 3 | 5 | +| 4 | 5 | + + +--- +Sparse relations +--- + +The way we were storing 1:m and m:1 cardinalities will lead to wastage of storage space because of sparse relation. A sparse relation is one where a lot of entities are not a part of the relation. + +> A sparse relation refers to a type of relationship or association between tables where there are few matching or related records between the tables. Many records in one table may have no matching records in the related table, resulting in gaps or sparsity in the relationship. + +To solve this, we can create a new table with `id` columns of both tables as done previously. +Pros: Saved memory +Cons: Need more join operations, may affect performance. + +Sparse relations can also happen in 1:1 cardinality. Solution for saving storage space whenever a sparse relation is present is to creat a mapping table. + +--- +Nuances when representing relations +--- + + +Moving forward, there can be cases where we need to store information about the relationship itself. If one of the two tables is storing information about the relationship, it dilutes the purpose of that table. + +So, in LLD classes, you will learn about SRP (Single Responsibility Process). The responsibility of everything must be defined. You should always have a separate mapping table for the information about relationship. + +Coming back to Scaler example, refer to the tables we created previously: + +We did not represent `current_instructor` before. So, what would be the cardinality between `batches` and `current_instructor`? 1 batch can only have 1 `current_instructor`. But an instructor can be teaching multiple batches at a time (morning batch, evening batch, etc). Hence, this is a m:1 cardinality. So we include the id of `current_instructor` in `batches` table. + +`batches` +| batch_id | name | start_month | curr_inst_id | +|----------|------|-------------|--------------| + +Similarly, for `batches` and `students`, 1 student can be in 1 batch at a moment, but a batch can have multiple students. So, this is m:1 cardinality, `batch_id` will be included in `students` table. + +`students` +| student_id | name | email | phone_number | grad_year | univ_name | batch_id | +|------------|------|-------|--------------|-----------|-----------|----------| + +For `batches` and `classes`, 1 batch can be in multiple classes and 1 class can have multiple batches. Hence, m:m cardinality. + +`batch_classes` + +| batch_id | class_id | +| -------- | -------- | + + +--- +Solution to Quizzes: +--- + +> -- +Quiz1: Option D (m:m) +Quiz2: Option D (Many-to-Many) +Quiz3: Option D (Many-to-Many) +Quiz4: Option B (One-to-Many) +Quiz5: Option C (Many-to-One) +-- + +Will see you all again, thanks! diff --git a/Non-DSA Notes/SQL Notes/13 Notes Schema Design 2.md b/Non-DSA Notes/SQL Notes/13 Notes Schema Design 2.md new file mode 100644 index 0000000..929a3a9 --- /dev/null +++ b/Non-DSA Notes/SQL Notes/13 Notes Schema Design 2.md @@ -0,0 +1,405 @@ + +--- +Agenda +--- + + +- Nuances when representing relations +- Scaler Schema Design - continued +- Deciding Primary Keys of a mapping table +- Representing Foreign keys and indexes +- Case Study - Schema design of Netflix + + + + + +--- +Scaler Schema Design - continued +--- + + +For reference from previous class, Scaler Schema Design: + +The requirements are as follows: +1. Scaler will have multiple batches. +2. For each batch, we need to store the name, start month and current instructor. +3. Each batch of Scaler will have multiple students. +4. Each batch has multiple classes. +5. For each class, store the name, date and time, instructor of the class. +6. For every student, we store their name, graduation year, University name, email, phone number. +7. Every student has a buddy, who is also a student. +8. A student may move from one batch to another. +9. For each batch a student moves to, the date of starting is stored. +10. Every student has a mentor. +11. For every mentor, we store their name and current company name. +12. Store information about all mentor sessions (time, duration, student, mentor, student rating, mentor rating). +13. For every batch, store if it is an Academy-batch or a DSML-batch. + +### Tables + +`batches` + +| batch_id | name | start_month | curr_inst_id | +|----------|------|-------------|--------------| + +`students` + +| student_id | name | email | phone_number | grad_year | univ_name | batch_id | +|------------|------|-------|--------------|-----------|-----------|----------| + +`batch_classes` + +| batch_id | class_id | +| -------- | -------- | + +Now, let's continue from here. What is the cardinality between `class` and `instructor`. As this is m:1 cardinality, `instructor_id` will be included in `classes`. + +`classes` + +| class_id | name | schedule_time | instructor_id | +|----------|------|---------------| ------------- | + +Every student has a buddy. Here, the cardinality of the buddy relation between a student and another student is m:1. + +| student | --- buddy --- | student | +| ------- | ------------- | ------- | +| 1 | --> | 1 | +| m | <-- | 1 | + +So, the `students` table will have one more column called `buddy_id`. + +`students` + +| student_id | name | email | phone_number | grad_year | univ_name | batch_id | buddy_id | +|------------|------|-------|--------------|-----------|-----------|----------| -------- | + +When a student is moved from one batch to another, this date is an attribute of the relation between `students` and `batches`. So, we will create a new table like this: + +`student_batches` + +| student_id | batch_id | move_date | +|------------|----------|-----------| + +As we have included `batch_id` here, we can remove it from `students` table but that will decrease the performance because everytime we will have to query on this new table also. So, for ease, we will keep the `batch_id` in `students` also. + +Every student has a mentor, the cardinality between student and mentor is m:1. So, the `students` table will have `mentor_id`. + +`students` + +| student_id | name | email | phone_number | grad_year | univ_name | batch_id | buddy_id | mentor_id | +|------------|------|-------|--------------|-----------|-----------|----------| -------- | --------- | + +Now, for mentor sessions we will add the `student_id` and `mentor_id` in the `mentor_sessions` table. + +`mentor_sessions` + +| mentor_session_id | time | duration | student_rating | mentor_rating | student_id | mentor_id | +|-------------------|------|----------|----------------|---------------| ---------- | --------- | + +Now, for the batch type, it can be DSML or Academy. Here, the batch type is enum (enum represents one of the given fixed set of values). + +Eg: +``` +enum Gender{ + male, + female +}; +``` + +So, we will have a `batch_types` table. + +`batch_types` + +| id | value | +| -- | ----- | + +Cardinality between `batches` and `batch_types` will be m:1. In `batches` table we will have `batch_type_id`. + +`batches` + +| batch_id | name | start_month | curr_inst_id | batch_type_id | +|----------|------|-------------|--------------| ------------- | + +This was a brilliant example of how to create a Schema Design. + +--- +How to represent enum +--- + + +1. **Using strings** + + `batches` + + | batch_id | name | type | + | -------- | ---- | ------- | + | 1 | b1 | DSML | + | 2 | b2 | Academy | + | 3 | b3 | Academy | + | 4 | b4 | DSML | + + + **Pros:** + - Readability. + - No joins are required. + + **Cons:** + - The problem in storing enums this way is that it will take a lot of space. + - It will have slow string comparison. + + + +2. **Using integers** + Here, 0 means DSML type batch and 1 means Academy type batch. + + + `batches` + + | batch_id | name | type_id | + | -------- | ---- | ------- | + | 1 | b1 | 0 | + | 2 | b2 | 1 | + | 3 | b3 | 1 | + | 4 | b4 | 0 | + + + **Pros:** + - Less space + - Faster to search + + **Cons:** + - No readability. + - We can not add or delete values (enums) in between as it will cause discrepencies. + - Also, what a particular value represents is not in the database. +3. **Lookup table** + It will have id and value columns where each type is stored as separate. The `type_id` of `batches` will refer to the `id` column of `batch_types`. All the above cons are solved with this method. + + `batch_types` + + | id | value | + | -- | ---------- | + | 1 | Academy | + | 2 | DSML | + | 3 | Neovarsity | + | 4 | SST | + +So, the best way to represent enums is to use lookup table. + + +--- +Deciding Primary Keys of a mapping table +--- + +### Example from previous discussion: + +For `student_batches` the primary key will be (student_id, batch_id). + +`student_batches` + +| student_id | batch_id | move_date | +|------------|----------|-----------| + +**OR** + +`student_batches` + +| id | student_id | batch_id | move_date | +| -- |------------|----------|-----------| + +If in case we have our table like this, the primary key will be `id`. +**Size of index will be lesser here.** + +Now, can there be a possibility that we might want to join a mapping table with another mapping table? +**Answer:** Yes! + + +### Example 2 +1. Scaler has exams. +2. For each batch a student joins, they will have to take exams of that batch. +3. Each exam is associated to a batch. + +`exams` + +| id | name | start_date | end_date | +| -- | ---- | ---------- | ---------- | + +Between batch and exam, each exam is associated to a batch, we will have to create a mapping table. One batch can have multiple exams, One exam can be present fo multiple batches. + +`exam_batches` + +| exam_id | batch_id | +| ------- | -------- | + +Similarly we also have a table called `student_batches`. + +`student_batches` + +| student_id | batch_id | date | +|------------|----------|------| + +To figure out which student went through which exams, we will need to join `student_batches` with `exam_batches`. Basically, we are forming a relation between two mapping tables. + +### Example 3 + +1. One student can belong to multiple batches. +2. Every batch has exams. +3. Same exam may happen on different batches on different dates. +4. If a students moves the batch, they may have to give some exams again. + +`student_batches` + +| student_id | batch_id | date | +|------------|----------|------| + +Cardinality between batches ad exams is m:m. So, we will have a `batch_exams` table. Date is also an attribute of this relation. + +`batch_exams` + +| batch_id | exam_id | date | +| -------- | ------- | ---- | + +Between students and exams also the cardinality is m:m. But if we have (student_id, exam_id) as primary key of the new `student_exams` table, it will not allow one student to take a particular exam twice. So, we will have to add `batch_id` also in PK. The below `student_batch_exams` will be our new table. + +`student_batch_exams` + +| student_id | batch_id | exam_id | marks | +| ---------- | -------- | ------- | ----- | + +Hence, we can see that sometimes a mapping may also have a relation with another entity. In these cases, not having a primary key can cause problems. + +**Advantages of a separate key:** +If a relation is being mapped to another entity or relation, it saves space. + +**Advantages of NO separate key:** +Queries on first column will become faster because the table will be sorted by that column. A mapping table is often used for relationships and thus will require joins. Having no separate key makes things faster. + + +--- +Representing Foreign keys and indexes +--- + + +Along with Schema Design questions, use cases are also mentioned. These use cases govern what indexes will be there. For example, we need a function to find all the classes of a batch. For this, we will simply have an index on `batch_id`. + +`batch_classes` + +| batch_id | class_id | +| -------- | -------- | + +Let's say that the learners often search mentor by a name. This is a use case. On which column of which table will you create an index for this? You have to create an index on `name` column of `mentors` table. + +`mentors` +| mentor_id | name | company_name | +|-----------|------|--------------| + +Now, foreign key is mentioned alongside creating Schema during the third step (representing relationships). You will mention after creating the attributes that this `column_A` of `table_A` will have a foreign key referring to the `column_B` of `table_B`. + +After drawing the complete Schema, mention the indexes. + +This was all about Schema Design! + +--- +Case Study - Schema design of netflix +--- + +Following is tge link to Netflix's requirements: +[Netflix Schema Design](https://docs.google.com/document/d/1xQbcv-smnV_JY6NUb4gz2owwPaQMWdoWty6PZyFEsq8/edit?usp=sharing) + +**Problem Statement** +Design Database Schema for a system like Netflix with following Use Cases. +**Use Cases** +1. Netflix has users. +2. Every user has an email and a password. +3. Users can create profiles to have separate independent environments. +4. Each profile has a name and a type. Type can be KID or ADULT. +5. There are multiple videos on netflix. +6. For each video, there will be a title, description and a cast. +7. A cast is a list of actors who were a part of the video. For each actor we need to know their name and list of videos they were a part of. +8. For every video, for any profile who watched that video, we need to know the status (COMPLETED/ IN PROGRESS). +9. For every profile for whom a video is in progress, we want to know their last watch timestamp. + +Let's approach this problem as one should in an interview. + +1. Finding all the nouns to create tables. + +* `users` +* `profiles` +* `profile_type` (lookup table) +* `videos` +* `actors` (cast is nothing but a mapping between videos and actors) +* `watch_status_type` (enum, it is an attribute of relation between profile and videos) + +2. Finding attributes of particular entites. + + `users` + | id | email | password | + | -- | ----- | -------- | + + `profiles` + | id | name | + | -- | ---- | + + `profile_type` + | id | value | + | -- | ----- | + + `videos` + | id | name | description | + | -- | ---- | ----------- | + + `actors` + | id | name | + | -- | ---- | + + `watch_status_type` + | id | value | + | -- | ----- | + +3. Representing relationships. + + Now, there are no relationships in the first and second use cases. Moving forward, what is the cardinality between `users` and `profiles`? One user can have multiple profiles but one profile is associated with one user. Therefore, it is 1:m, id of user will be in `profiles` table. + + `profiles` + | id | name | user_id | + | -- | ---- | ------- | + + What is the cardinality between `profiles` and `profile_type`? It is m:1, `profiles` will have another column `profile_type_id`. + + `profiles` + | id | name | user_id | profile_type_id | + | -- | ---- | ------- | --------------- | + + What is the cardinality between `videos` and `actors`? One video can have multiple actors and one actor could be in multiple videos. So, it is m:m. + + `video_actors` + | video_id | actor_id | + | -------- | -------- | + + Status is an information about relation between `videos` and `profiles`. Hence, a new table is created. Last watch timestamp is also an attribute on these two. + + `video_profiles` + | video_id | profile_id | watch_status_type_id | watched_till | + | -------- | ---------- | -------------------- | ------------ | + +Time for a some follow up questions. + +--- +Quiz 1 +--- + +What should be the primary key of `video_profiles`? + +### Choices + +- [ ] (video_id, profile_id) +- [ ] (profile_id, video_id) +- [ ] (id) a new column + +--- +Quiz explanation +--- + +In this question, (profile_id, video_id) is the best option because as soon as we open netflix and a particular profile, it shows us the videos we are currently watching. So, to make this query faster, primary key should have `profile_id` first. + +This is all about the Schema Design!