Lecture_Notes/imperfect_notes/pragy/intro week - HLD 101.md
2020-03-19 12:51:33 +05:30

8.6 KiB

Intro Week - 3 - HLD 101

What is HLD?

  1. used to working with code on a single machine
    • you give some input, you get some output
  2. how do you scale to 1000 or million or 100 million users?
    • what should your architecture be
  3. This is what HLD is about
  4. You need to now HLD, LLD to progress in your career

How do you access Google.com?

  1. Most large systems have
    • business logic at some location
    • many many clients using that sytem
    • via some interface
      • usually a website
    • so, first layer us client
      • on mobile / browser / ipad
  2. something like Google.com
  3. when you type Google.com, first thing that happens?
    • DNS Lookup/resolution
      • Every device on the internet has an IP. Including your system
        • 0-255 . 0-255 . 0-255 . 0-255
        • 192.168.0.1
      • Similarly, Google.com also has an IP.
      • DNS maps domain names to IPs
        • server which takes in domain name and gives you back IP
        • Google DNS
  4. Browser gets the IP address and tries to connect
  5. Request goes to Google's servers.
  6. Some Magic happens and a response is generated
  7. Response is returned back to you c04ecd7d.png
  8. Now, HLD deals with everything that happens in this Magic box

DELICIO.US

  1. 2005/6
  2. Bookmarking website
    • chrome wasn't there yet
    • firefx and IE were all the rage
  3. If you store bookmarks on one machine
    • how do you access them on another machine?
  4. Delicio.us stores all your bookmarks.
    • if you change machines
    • just login to delicio.us
    • all your bookmarks are there

Humble Beginings

  1. delicio.us started with a single laptop
    • dcb4d992.png
  2. Let us walk through how it evolves over time
  3. Started getting popuar
    • DB requirement started growing
    • 40 GB storage wasn't enough
    • already at 35GB. Filling at 5 GB / month
    • how do I fix this?
  4. Buy a better laptop
    • one with 100GB storage
  5. This is Vertical Scaling
    • This is how servers would scale in 90's
    • IBM
  6. Delicio.us got even more popular
    • 100 GB will last for only 2 more months
    • no better laptop in the market
    • how to fix it now?
  7. Buy more laptops
  8. DNS issue
    • one domain mapped to only 1 IP
      • usually. Google has multiple IPs
      • you need to be huge to get multiple IPs
  9. How to distribute users over different laptops, if I have just 1 ip?
  10. Load Balancer
    • because entire website cannot be on one single server. What if I restart?
    • So, multiple macines, and load balancer distributes the requests
    • simply forwards requests and gives back results
    • could be different communication protocol - socket vs http vs protobuf
    • Google has multiple IPs
      • 1 LB with multiple backups
      • switching time is seconds to microseconds
  11. Note that going from just 1 to 2 laptop requires a LB
    • because the client just has 1 IP and doesn't care what you're doing internally
  12. Usualy, LB just reroutes and doesn't do any heavy computation 12c0d462.png
  13. This is called Horizontal Scaling.

Load Balancer

  1. 6b536bd2.png
  2. How will you design the LB?
    • Round robin - go one by one to each machine
    • No need to consistent hashing here
  3. What is M2 goes down?
    • or n/w breaks
  4. Keep an alive mapping
    • Health Check
    • Poll/Ping each server. Must respond in 100 ms
    • Or server sends heartbeat
  5. What if LB goes down?
    • Backup servers
    • Managed via Zookeeper
  6. Most LBs allow configuring Heartbeat vs Polling, frequency, ...
  7. What if server is not down, but slow?
    • need rate management
    • measure avg response time
      • assuming requests are similar
    • do some sort of weighted Round Robin
      • weights could be preassigned
      • or based on Avg Response Time
      • or a hybrid approach
    • will need to invalidate the Avg Response Time? Sliding window / TTL
      • because machines can start failing slowly
  8. Of course, LB has to be non-blocking
    • so that no request slows down the other requests
    • most requests are IO bound
    • servers are also multicore. So you can process multiple CPU bound requests too!
  9. DDOS attacks are usually handled at the LB stage
    • rate limiting by ip/userid
    • whitelisting and blacklisting
  10. this is stateless LB. Doesn't matter which server the request goes to

Stateful LB

  1. Delicio.us could not store all data on one laptop

    • had to split
  2. If U1's data is on L1, U1's request should go to L1 and not other servers

  3. How to do this?

    • store {userid -> serverid}
      • Too big if we've 8B users
      • 8 bytes + 4 bytes + overhead
      • 100 bytes * 8B = 800 GB
      • Can't fit in RAM
    • userid % N
      • don't need to store anything in memory
    • or some hash function
  4. But what if server goes down? Up?

    • every userid will be moved
      • for all userids > N
    • so, a lot of data transmission
  5. Hash function should be able to handle such things

  6. Consistent Hashing


Consistent Hashing

Step 1

  • userid space
  • serverid space
  • hash space
  • H(uid or sid) -> [0, 10^{18}]
  • Hash both servers and users
  • 3b3d4e20.png
  • User gets assigned to the closest server
  • Since Hash is almost random, equal probability of user to be assigned to any server

Step 2

  • What happens if server dies?
  • Say s2 dies
  • 37f4f6bf.png
  • All users of s2 get assigned to s3
    • they should've been distributed equally
  • What if I add server?
    • add s5
  • only the load of s4 is reduced. Load at s1, s2, s3 is still the same
  • 91393714.png
  • Why hash and not equidistant? How will you add more servers? you will have to move every server

Step 3

  • create more markers for each server
    • not more copies of server
    • just markers
  • helps distribute load more equally
  • b61475e4.png
  • basically, use multiple hash functions H1 .. H100
  • Now, if a server dies
    • its users will be almost equally distributed amongst the other servers
    • 9f4a3dfa.png
  • If server added
    • each server gies up some of its users to new server
    • 8a5888d2.png

Algorithm

  • Store (hash, server) in sorted order
  • binary search for user's hash
  • 612e3e58.png

  • Application layer
    • application layer hosts the business logic
    • authentication
    • mostly, all machines in the application layer are running identical code
  • Storage Layer
    • need to maintain context
    • stateful
    • 0e75c7a9.png

  • Load Balancer gets the request

    • via some protocol - Socket / HTTP
    • how do you prevent the LB from becoming a single point of failure?
      • have multiple load balancers?
        • ip can be located to only one machine, not a bunch of machines
      • Have the domain map to multiple IPs
      • browser takes care of making sure that all ips are fetched and we connect to some ip which is up
    • is needed so that we can
      • ensure that each machine gets almost equal load
      • we can add/remove machines when we want
      • to have a single point of contact for the outside world
  • Internet vs Intranet

  • Vertical scaling - getting a bigger machine

    • limited by commercial machines
    • compression can only go to some extent
  • registering ips with ICANN is a slow process

  • Horizontal scaling - get lots of smaller machine

    • need to be able to load balance and distrubute data