High Level Design

 HLD Basics & Consistent Hashing

  • Technologies keep changing. What matters is building the intuition what systems fit best. How architecture works. 
  • LLD is about how to structure code
  • HLD is about how to structure components, machines & what are those machines doing. 
  • Both LLD & HLD combine system design
  • del.icio.us(built during 2005/06) -> acquired by Yahoo at later point
  • ICANN -> non profit central authority that manages domain names
  • GoDaddy or domains.google.com -> resellers who sell on behalf of ICANN
    • to get domain name to ip mappings
  • ISPs(airtel, jeo, google, cloudflare(CDN), .. ) maintains DNS server with copy from ICANN for faster internet speed
  • DNS propagation takes 24 hours to sync across the world
  • ISP
    • One can connect to internet only via ISP(Internet service provider)
    • Every isp has range of ip addresses allocated to them. 
    • ISP picks one available ip address in the range and allocate to system
    • One has option to ask for static IP address by paying additional amount
  •  del.icio.us
    • start with one machine 
    • When more than one machine is present, load balance is required to route traffic as required during downtimes/patching/distributing.
    • Is load balancer a single point of failure
      • Register multiple load balancer IPs in DNS and return all of them. Browser can talk to which ever IPs are responding
      • Browser can have another standby machine which can become active when the primary load balancer dies.
    • Ways to find a machine is live or not
      • Heart beat(servers will tell balancer that they are alive)
      • Health check(load balancer will ping the servers to check the status)
    • Zookeeper(distributed system) keeps track of status of all machines and notify when they are down
    • 1 Million(10^6) bits is ~1MB
    • Storing the data in every server is not scalable due to storage capacity being limited
    • Scaling
      • Vertical scaling
        • better hardware
      • Horizontal scaling
        • Add more machines & split data among the machines(Sharding)
        • How to avoid load skew & how does load balancer know which machine to go
        • Distribution the users/load to servers should be Extremely light.
        • When new machines are added/removed, it should still work with out being too expensive
          • UserID % machines -> fails when machines are added/ removed
          • Range of users to machine -> fails when machines are added/ removed
          • Approach that works is Consistent hashing
        • Consistent Hashing(Standard)
          • take a huge number (long long int which is 10^18)
          • Come up with 2 hash functions one for machines & other for users. Range for both should be same.
            • hash function Hash(machine_num) which generate random number in range of [0, 10^18]. Each machine will be assigned with the value of hash
            • Take another hash function Hash(user) which also generate random number in range of [0, 10^18]. The machine with hash value greater than hash(user) will be assigned to the request
          • Draw back: When any machine dies, the load on the next server increases which subsequently can die and cause cascading die of all servers.
        • Modified Consistent Hashing
          • Generate multiple hashes for machines (say 3 diff hash functions)
          • This will distribute load evenly across other servers when machine is added/removed/dies
        • How is data copied and avoid down time
          • Happens in 2 stages
          • For cases where new machine is added
            • First copy the data and then added machine
            • Copy the delta that got added during the copy of above step
            • delta will be copied after new machine is added(eventual consistency)
        • Example of hash function
          • Given a number, generate large string 
            m = MD5(userId);
            convert string m to integer & take mod %10^18
        • why consider 10^18 -> reduces collision due to huge number
        • Usecase of Sharding
          • Store large amount of data across machines
        • Usecase: Consistent hashing is used where ever sharding is used.
          • No SQL DBs: Casandra, HBase, Dynamo DB, elastic searches
      • Q&A
        • Reverse Proxy/ Load Balancer
          • When a request comes, outsource work to different machine where client doesn't have exposure to that machine, get response and relay it.
  • Having data & code on same machine is good or bad model
    • Both on same machine: Tight coupling
      • Code deployment cause data also unavailability
      • Less resources for code as DB will consume RAM & CPU processing
    • Different machines
      • network latency
      • Appln servers will be stateless
      • Autoscaling feature can be utilized as they are stateless
      • Zookeeper is required to notify apps servers about db machines that are added/removed.
  • Min layers in any appln system
    • Load balancer
    • apps servers
    • DB servers
  • Work of apps server
    • Fetch bookmarks from DB
    • Fetch Images/multi-media
  • Caching is storing things close to you to fetch fast.
    • Browser storage(Caching starts at browser)
      • cookies
      • session storage
      • java script files
      • images
      • How browser works with html files
        • First Loadbalancer returns only HTML file which is text and has links to cdn
        • ISP has integrations with CDN. ISP DNS will give preference to nearest IP address
        • Redirection based on request origin (fb.com -> to facebook.com, google.com -> google.co.in)
    • CDN(2nd level caching)
      • Akamai
      • Cloudflare
      • Cloudfront
      • fastly
      • They will reroute to nearest data center
    • Local Caching
      • Cache in server itself
    • Global Caching
  • Cache - Challenges/Limitations
    • Limited in size
    • Not actual source of truth(can cause data inconsistency)
      • Cache Invalidation time
        • TTL
      • Write thru system
        • All writes to DB goes thru cache. It updates DB then itself & returns Success.
        • if cache doesn't have entry, first gets from DB and then updates
        • Writes slower and reads faster
        • Useful for Read Heave systems
      • Cache Invalidation types
        • write back cache
          • Update cache & return success
          • sync with DB after returning success
          • Gives high thruput
          • good for applns like google analytics, 
        • write around cache
          • Fetch data and update cache(after TTL expirety) after DB is updated
          • appln like DNS entries, 
      • Cache Eviction
        • LRU -> Songs
        • MRU-> one time use cases -> covid certificate
      • Cache hit/miss
      • Local caching of data on apps server(how to invalidate/update when changes done for files on file storage)  eg: facebook news feed
HLD: Caching Continued
  • Caching for huge files
    • TTL : File can become invalid very quickly OR invalidation will be too late 
    • Use DB to maintain meta data & go thru it for local cached data
    • https://gist.github.com/jboner/2841832

    • Redis
      • single threaded, key value store
      • https://try.redis.io/
      • https://redis.io/docs/data-types/sets/
      • https://redis.io/docs/data-types/sorted-sets/
    • why global cache(Redis)
      • Queried often
      • Derived information(pre computing)

    • Global Cache
      • This will reduce cache miss as local cache will miss for each min/apps server
    • Facebook newsfeed (case study)
    • 3 billion users(1 Billion are active where 2 billion are monthly active, 500million are daily active, 5 million posts)80% users only read, 20% interact by liking & commenting, 1% users contribute)
    • 100 bytes per post -> 5million * 100bytes *2posts per day = 1GB
      • Users, User_Friends, Posts
      • Profile Page
        • Posts made by respective user
      • NewsFeed
        • Posts made by friend posts
        • How to optimize news feed generation
          • maintain recent posts(last 30days) for all user in single table

    • Redis - has copy in hard disk
    • Memcache - stores data only in RAM
    • hazelcast, aerospike are other dbs similar to Redis
CAP Theorem & Master-Slave
  • https://www.linkedin.com/in/prakhar3agrwal/
  • https://www.youtube.com/user/prakhar3agrwal
  • CAP(Consistency, Availability, Network Partition)
    • In case of network partition(2 nodes cannot talk to each other, but they can talk to network), we have to choose between consistency & availability. 
  • PACELC
    • It is nothing bug CAP theorem
    • In case of network partition, one have to choose between availability & consistency
    • If not network partition, then one have to choose between Latency & Consistency
  • Master-slave
    • Shard is collection of machines
    • replicas/slaves -> copies of master
    • High availability, not Eventual consistency, Low latency
      • Master takes write & returns status
      • Async, best try to sync data to slave
    • High availability, Eventual consistency, Low latency
      • Master write, one slave writes & returns status
      • All slaves sync eventually
    • Highly consisteny(Strong Consistency)
      • Master and all slaves write & return success
      • This can impact latency & availability
    • Gossip, RAFT, Paxos
      • One slave transferring the data to other slaves is Gossip
  • Zookeeper
    • It is cluster of odd number of nodes
    • Keeps track of who is master & who is slave
    • ephemeral node - the node that has lock on zookeeper master file
    • If master node fails to send heart beat, zookeeper releases lock
    • Slaves will try to acquire lock and one of them will get lock and get master
SQL vs NoSQL
  • E-commerce - sql schema doesn't work well - schema is not sql friendly
  • SQL queries power is reduced when info is across machines
  • SQL DB doesn't have good out of box sharding
  • how to find sharding key
    • identify regular queries or reports
    • find the common key across queries
    • common key will be probable candidate for sharding key
    • sharding key for 
      • uber/ola systems will be location id
      • Irctc - ticket id
    • Slack
      • get all groups for a user, org
      • get recent messages for the groups
      • send message(group id, sender, text, message_id, ..)
      • get group metadata for a group
    • YouTube
      • getVideos(channel)
      • getNewsFeed(user, location..)
      • getVideosTrending
      • Seach(topic)
      • topic()
      • playVideo()
      • Datamodel
        • user, usertag, userhistory, videsodb(tags), 
    • No sql db schmas
      • key-value pairs (similar to hash map) eg: Redisdb
      • Document DB (json) eg: 
        • Good for random & fragmented schemas with filtering requirement. Within filtering you don't have too many entries.
      • Column family (sql)
        • Sharding key will be defined
        • Column family is like a table in sql(every column is a table with out anyother columns in it)
        • For email db, it maintains profile as column family, threads, emails as another and so..
        • Column family is by default ordered in desc
        • This is useful when matching entries are huge and accessing recent ones is quick eg: email
  • Design Data base with following requirements
    • All info doesn't fit on single machine(has shard key)
    • No manual load balance of machines. Only set of machines(cluster) are given.
    • when load goes up, add more machines to cluster
    • DB should never lose data
    • Good to have(consistent vs availability)
  • Solution
    • Cluster & inital set of machines
    • Sharding key
Case Study 2 (TypeAhead)
  • Design problems are open ended. If one follows random framework then discussion could go in any direction. Discussion can go for hours without any thing concrete. Hence, it is important to put a structure to what is going to be discussed.
  • One possible structure is to breakdown into 4 parts
    • Requirement Gathering: Make sure you understand what you are building and what are minimum set of things required for it(mvp)
    • Estimate Scale
      • Read Heavy/Write Heavy
      • Sharding required OR not
      • How much Replication required
      • QPS
    • Design Goals
      • Consistency vs Availability /latency
    • APIs(how external world call me) + System Design
      • Storage requirements
      • apps servers role

  • Type Ahead
    • MVP
      • Given few letters show me 5 suggestions
      • Given letters should be prefix
      • Most frequently searched terms
      • No personalization
      • min 3 letters
      • levenshtein distance not in scope
    • Estimation of Scale
      • Read: entering few chars and waiting for response
      • Write: Search operation as it updates frequency in system
      • This is both read & write heavy system.
      • #number of searches in a day on Google is around 10 Billion
      • #number of new terms per day : 25%  & 75% existing terms
      • Each search say 100 chars, new terms will be 2.5Billion * 100 chars = 250GB
      • Space per year = 250GB * 365 * 10 = ~800 TB
    • Design Goals
      • High Availability as we care about pattern and NOT about exact frequency
      • Latency should be extremely low
    • API
      • getSuggestions with search prefix
      • updateFrequency(search phase)
      • Note: Every node stores top 5 suggestions in the form of list. Also it stores frequency of the word that ends at particular node.
      • How to reduce writes
        • Sampling
          • rand()%1000 then add to hashmap
        • Threshold driven
          • Buffer hashmap - (term, frequency)
      • Sharding
        • by first 3 chars -> 26*26*26 = 17576 machines -> consistent hashing
      • Popularity is combination of (frequency, recency)
      • Time Decay
        • Time Decay Factor
          • Every day divide by time decay factor
No SQL
  • Replication level: #machines the data is replicated
  • Advantages of replication
    • Higher traffic can be supported
    • Slave can take master place incase master goes down(failure handling)
  • How to create your own nosql db
    • How to create particular replication level
    • How does db increase/decrease #of shards automatically
  • Qn: Imagine you are creating a DB
    • It has huge amount of data in TB/PB
  • Sharding is required for Huge amount of data & Replication required to handle high number of requests.
  • How will NOSql DB scale
    • Multiple shards -> each shard with one master & multiple slaves
      • Eg: BigTable by google(first nosql db ever created)
    • Multi-master setup
      • Casandra (Facebook)
      • DynamoDB(Amazon)
  • Approach1: Master Slave with consistent hasing
    • BigTable uses above master-slave architechture
    • Say we have 15 machines
    • Based on load, I divide my data into multiple shards using consistent hashing approach
      • Requests will come to manager like zookeeper.
      • Based on sharding key, manager decides which shard the request go to
      • Each shard has master + 2 read replicas
      • Create 4 shards and it will leave 3 spare machines
      • if #spare machines is <= #machines required for a shard, add them as extra replicas
      • If a machine dies from a shard with machines> replication count, then do nothing
      • When to add new shard
        • DB space
        • CPU utilization
        • #of queries
        • as a thumb rule - when 66.67% load is acheived
      • How a new shard is introduced
        • 2 stages
          • Staging stage
            • new machines are created
            • Data starts getting copied from the earlier shards
            • Part of data which should now belong to new shard based on consistent hashing algorithm from read replicas of other shards
            • new read/write queries for the new requests are being served from earlier shards
          • Live Stage
            • Consistent systems
              • Downtime for users on new shard
              • Data is copied and then made live
            • High Availability(Non-Consistent) Systems
              • No downtime required, new shards can start serving requests. Asyncronously, the new data is copied to it from prior shards
            • If ConNew shard is also serving the queries
  • Approach2: Multi-Master architecture
    • Amazon Dynamo & Facebook Cassandra
    • Usecase: Too many writes
    • Lets assume #machines are present on a ring as per consistent hashing mechanism
    • When a user requests comes, store data to next immediate n distinct number(depending on replication factor) of machines to right
    • Cassandra has tunable consistency by controlling replication factor value
      • It allows user to set 2 properties
        • Number of machines to read from before sending results
        • Number of machines to write to before sending success
      • R+W>replication level-> consistent system
      • R+W<=replication level -> available system
HLD: Multi Master
  • Storage in NoSQL
    • Read & write should be fast

  • Computation across machines/shards
  • 2 level of tables
    • Lookup Table -> key, start block, end block  -> in harddisk
    • Index table -> sortedset, startaddress, #of entries, entrysize -> In RAM
    • Actual Data
    • LSM Tree - Log Structured Merge 
      • To reduce search space on hard disk
  • Distributed Computation
    • Map Reduce
      • Map
      • Shuffling
      • Reduce
  • Machine1: A has friends B, C, D
                      F has friend C
  • Machine2: B has friends A, C, E
                      E has fiends B
  • Machine3: c: a, b, f
                      d: f, g
  • Return list of mutual friends
Case Study3: Messaging
  • MVP
  • Estimation of Scale
    • Sharding
    • Read vs Write
  • Design Goals
  • API 
  • Architecture
  • Design of a Messaging Application
    • Whats app(1-1, no storage)
    • Facebook Messenger/Telegram(with Storage)
    • Slack/Microsoft Teams/Discord(Group chats with storage)
  • MVP Requirements
    • Ability of someone to chat 
    • Authentication
    • Recent chat history
    • Online Indicator
    • Sent/Delivered/Read indicator
    • Only text vs multi-media messages
    • Group messaging
    • Upto 10K chars for  text message
    • Notifications
    • store messages
  • Estimation of scale
    • read heavy/write heavy
      • message app is like read/write are same ration
      • Large number of writes - Casandra, HBase
      • Large number of reads - sql
      • reads & writes are almost equal - We use db optimized for writes with some optimizations
      • message (sender, receipeint, time, id, text)
      • avg size of text can be 250 unicode chars say. 
      • 50Billion messages per day on whats app
      • 50B * 1KB = 50GB * 1kb = 50TB per data - hence sharding is required
      • in 5 years = 50TB * 365* 5 = ~90PB
    • will I need to shard
      • #users
      • message lenght
      • #requests per sec
    • https://www.imdb.com/title/tt2575988/
    • messages per sec - 50B/86400 messages/sec which is 500,000 per sec 
    • Assuming the peak time traffic will be 3 times QPS will be 1.5million messages per sec
    • Design goals
      • low latency(time per task)
      • high throughput(#tasks per sec)
      • Facebook : consistency, availability
        whatsapp: highly available & eventually consistent
        slack, MS  teams: highly available & eventually consistent
    •  API Design
      • sendMessage(sender, receiver, text,  idempotency key,
        • Idempotency key - not a db key, it is just to prevent duplication and redundancy.
        • If server receives message of same idempotency key twice, it just rejects the 2nd one
      • getRecentConversations(userId, limit, offset)
      • getMessages(senderId, receiverId, limit, offset)
    • Architecture
      • Sender -> Server -> Receiver  (Client server architecture and not pier-pier)
      • Storage -> messages should be shared
      • What should be the param on which sharding should happen
        • By conversation id
          • This requires read of all shards to getRecentConversation of the user
        • By UserId
          • sendMessage is sent to 2 shards incase sender & receiver are on different shards
          • getMessages - can be read from any one shard
          • getRecentConversations - one shard
        • How to handle send when sharing by UserId
          • Message should be stored in 2 shards
          • Save to receipent shard - and send success. Sync to sender shard after sending success.
HLD: Zookeeper + Kafka
  • Term of the day: Reliability
    • If your system is able to perform correctly in the event of adversity
  • Problems: LinkedIn formed in 2003. 
    • design goals system
      • need a message broker system, which can have a high throughput
      •  
  • Why do we need Kafka
  • What is apache Kafka
  • Cluster, Broker, Core of Kafka
  • Producer & Consumer
  • Demo
  • Producers & Consumers
  • Assignment(Lamda & SQS)
  • Demo
    • Start Zookeeper
    • Start Kafka
    • Create kafka topic "Kafka demo" with replicationFactor 1 & Partitions 1
      • This creates a directory with name "Kafka demo" & messages are saved in .log file
    • Create Producer for topic  "Kafka demo" 
    • Create Consumer for topic  "Kafka demo" 
  • Simple queuing service (SQS): 
    • One producer & one type consumer where the message is deleted once any consumer consumes it
  • Partitions
HLD: Case Study 3(Messaging) 2
  •  Sharding Keys evaluated
    • userId
      • is better 
      • we have to write to 2 servers. Write to receiver first and update sender asyncly
    • ConversationId
      • has draw backs
  • Architecture
    • User sends/receive a message
      • Load Balancer
        • Many appln servers
          • connected to data source shared by userId
    • Type of DB
      • Reads and writes are equal. Hence, we choose one db which is read optimized and another which is write optimized. Keep both of them in sync as per requirement.
        • Use nosql(Hbase) for faster writes. In HBase, write is just a append on transaction log/commit log. (Mongo DB updates disk as well with values. So it will not be as fast as HBase). Second update will be to Redis for faster reads.
        • Reads will be best with In Memory Cache(Redis)
          • 3 types of Cache systems
            • Write Thru(Write to Cache and disk. Return success only when both return true). This is appropriate strategy as READ happens immediately after Write.
            • Write Around()
            • Write back(Write to cache and return success. Sync with disk at later stage). This strategy cannot be used in this case.
      • how to notify to Users about message
        • In HTTP, server cannot initiate a connection to client. Server only responds to client created connection.
        •  3 way
          • Polling: Client repeatedly(say every 10 secs) asking server if there message
            • Resource intensive
            • Too many requests at server
            • Waste lot of resources
            • Not real time
          • Long Polling
          • Web Sockets
            • As soon as user device comes online they open connection with server. The connection remains open till the time user is connected to internet. This connection can be used by server to send notifications, messages. Also, client can use it for any thing.
            • HW: 
              • Learn how websockets work 
              • how to create web socket connection.
              • chat appln using socket.io https://socket.io/get-started/chat
            • Web socket is a protocol which works over http & tcp where a connection is established between client & server, and connection is persisted that allows client & servers to talk to each other when ever needed.
        • Usecase of Kafka
          • Every server subscribes to Kafka for messages of users that have established 
          • When ever a new message is sent, after storing the message in redis, hbase an Event is sent to Kafka. 
      • Notification Service
        • What notification to send
        • Who to sent the notification
        • Where to send the notification
        • Notify() sends the message to queue
        • Workers take the message from queue and invoke email service, Firebase could messenger, Apple push notification.
      • Online Indicator
        • Web socket connection is present?
        • app is open (whats app uses it)
        • app is open and person is interacting(MS teams)
        • Consistency vs availability - it is fine with almost consistency and no need for 100% consistency
      • Heart beat mechanism
        • As soon as a user open the app, I will send a heart beat every 10 secs to the server telling the server that I am online
        • Redis cache to store userId, lastReadOn
        • Write back cache and sync data to sql db for persistence
    • Complete Design
      • Users 
        • Load balancer
          • application servers
            • Message service
              • Redis write thru cache(as we need high consistency)
              • HBase
            • Online indicator service
              • Redis write back cache
              • SQL(using batch processing)
            • Queue
              • to send message & status to correct appln server
  • Great vs Avg software engineer
    • Logical thinking or Problem solving ability. Given a scenario, what is right kind of system to use. Knowing the possibilities of architecture to solve.(design principles)
    • Pick up things quickly
    • Ability to understand system and why it should be designed
  • Master - slave architecture
    • When write comes, it should write to Master first
    • when master is dead, a new master should be elected. Also, should be informed to every other server
  •  ZooKeeper
    • Track health of master
    • Inform all apps servers/slaves of who the master is
    • When data is stored in zookeeper, it save similar to directory structure
    • File in zookeeper is called as ZooKeeper Node.
    • Every file in zookeeper is one of 2 types. Ephemeral, Persistent
    • The master node writes its ip to Ephemeral node and sends heart beat every 5 secs
    • All app servers & slaves subscribe to ZooKeeper node
    • When ever info in zookeeper node changes, it informs every one about the changes
    • Flow
      • User request
        • load balancer
          • apps servers
            • read master db node details for first time
              • send write request to the master node
    • zookeeper is collection of machines, usually Odd number of machines
    • Quorum is majority x/2+1 or more
    • Read and Write to zookeeper happens from Quorum number of machines
    •  
    • Assignment: https://docs.google.com/document/d/1P72fx_a6VwKfyZbZBrulJhLqRFFsCmqeZOTat2cCdRo/edit?usp=sharing
  • Kafka
    • Topic is a category within Queue
    • Persistent Queue which is broken down topics
      • Kafka is 2 times faster than RabbitMQ
      • Kafka
        • For event streaming
      • Persistent queue is different from DB. It has Event retention period which is 1 week by default
      • FIFO/only append
      • Immutable messages/events
      • We can specify partitions for a Topic
        • this will help not constrained by one machine
        • Parallelize consumption
      • Machines inside Kafka are known as Brokers/Servers
      • Consumer Groups
        • Specify which consumer will read from which partition
        • Each consumer maintains offset at partition level
        • Also, consumers put an event in consumer_offset topic to know the status incase they come back after they are down

  • Demo
    • Kafka messages/logs
    • Kafka fault tolerance
    • produce messages programatically
    • Start zookeeper server
    • Start Kafka server
      • bin/kafka-server-start.sh config/server-2.properties
    • Create Topic 
      • bin/kafka-topics.sh --create --topic myTopic13 --zookeeper localhost:2181 --replication-factor 1 --partitions 1
      • cd myTopic13-0
      • ls -> shows index & log files
    • Create Producer
      • bin/kafka-console-producer.sh --broker-list localhost:9092 --topic myTopic13
        • msg1
        • mst2...
    • Create Consumer
      • bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic myTopic13 --from-beginning
    • Take Aways
      • Persists messages on HD in log file
      • Immutable messages
  • Demo2
    • start 3 kafka servers
      • bin/kafka-server-start.sh config/server-1.properties
      • bin/kafka-server-start.sh config/server-2.properties
      • bin/kafka-server-start.sh config/server-3.properties
    • bin/kafka-topics.sh --describe --topic myTopic13 --zookeeper localhost:2181
    • Learnings
      • Kafka is fault tolerant
      • decouple nature of producer, consumer, servers
      • Leader/follower mechnism done by Zookeeper
  • Demo3
    • Linger.ms in Kafka - 100ms
    • batch.size 1000
    • Kafka writes only when either of conditions are met

  • https://docs.google.com/document/d/1P72fx_a6VwKfyZbZBrulJhLqRFFsCmqeZOTat2cCdRo/edit?usp=sharing
LLD: Remaining Content 1
  • Machine Coding of Splitwise(Expense sharing appln)
  • https://github.com/Naman-Bhalla/splitwiseJune22/tree/master/src/main/java/com/scaler/splitwisejune22/models
  • doDutch/Paytm/google pay/HDFC
  •  Class Diagram
    • Nouns or Visualization
      • Users, groups, Expense
    •  BaseModel
      • id
    • User extends BaseModel
      • name
      • phone number
      • password
    • Expense extends BaseModel
      • amount
      • desc
      • createdBy
      • creationDate
      • List<UserExpense> paidBy
      • List<UserExpense> owedBy
    • UserExpense extends BaseModel
      • UserId
      • ExpenseId
      • Amount
    • ExpensePaidBy
      • ExpenseId
      • UserExpenseId
      • UserId
    • ExpenseOwedBy
      • ExpenseId
      • UserExpenseId
      • UserId
    • Group 
      • Name
      • List<Users>
      • createdBy
      • Desc
      • List<Expenses>
    • https://github.com/Naman-Bhalla/splitwiseJune22/tree/master/src/main/java/com/scaler/splitwisejune22/models
    • https://github.com/scaleracademy/splitwise/blob/dec2022/docs/requirements.md
HLD: Case Study 4(Elastic Search)
  • Sentence to words
  • reverse index
  • it is for full text search-> ES is most efficient
  • running = ran
  • Apache Lucin -> Given a document, 
    • removes/filters
      • is, the ,
      • pronouns - a an, the
      • stop words 
    • lementization
      • reduce to root word
    • stemming
      • remove ing to reduce to root word
    •  Lucene is single software on single machine
      • not fault tolerant & single point of failure
    • Elastic search is layer above Lucene
    • Google search is far more advanced then Elastic Search
    • Elastic Search
      • Machines are called Nodes
      • Index of (collection of docs)
    • how write works
      • when document is sent, it does hash to find the shard to write. using shard it finds machine
      • Writes to machine/shard(reverse index write)
      • Sync to other shards happen asyncronosly
  • Build Large File Storage
    •  Functional Requirements
      • read, write huge files - GBs to TBs files
      • generate id
      • Streamming of file
      • access control/permissions
      • Files are immutable
    • Non-Functional requirements
      • Fault tolerant + Durable
      • High throughput
      • High Availability
      • low response time
    • Possible Approaches
      • Split file into chunks
      • Split into too many small chunks(1kb) 
        • Pros
          • Parallelize more
          • failure rate will be less
        • Cons
          • But every chunk will have overhead of opennig connection, requesting, OS identifying file and transferring. This can lead to massive overhead
          • Meta data of chunk will be huge
      • Split into big chunks(1Gb) 
    • Solution Approach
      • LFS will be a collection of machines
      • LFS Client on client machine. It chunks file into 
      • On LFS, have some Central storage(Master machines which has directory of chunks)
        • It has file -> chunk mapping & Chunk -> machine mapping
      • LFS client requests master to store file by giving name, # chunks
      • Master creates entry for the file and decides chunk to machine mapping
      • Client then connects to machines & writes to the machine
      • Pros
        • Parallelizing the approach/write
        • reducing the cost of upload failure
      • How to avoid single point of failure for master
        • Have replica
        • have zookeeper to keep watch on who the master is
      • How to handle cases where machines go down
        • Have replica for machines as well
        • Chunks have replicas
        • Master will have reference to all copies of chunks in different nodes
      • How to avoid failure of all machines having copies of a file
        • Maintain file copies in different RAC machines(same data center)
        • OR maintain in different data center
      • How to store chunk copies in diff machines
        • consistent hashing
        • round robin
        • random
      • How READ works
        • Reads can spread across primary and replicas
        • All nodes/machines to cache with LRU algorithm to perform well for hot documents
      • What we did to reduce latency
        • break the file into chunks and process parallely
        • cache in individual nodes
      • How many entries will be created in Master for 1PB of data in LFS
        • 1000 * 1000 * 1000MB/128MB ~ 10Million entries of (chunkid, m1, m2,m3)
        • 10B * 1Million = 160MB -> sharding is not required.
      • In HDFS(Hadoop distributed File System) -> Master is called NameNode & machines storing chunks is DataNode
      • When do we need collection of masters
        • when there are 10Million reads per sec
      • HDFC is not very good in latency but good in throughput
    • Use case
      • When client uploads the photos to Facebook
      • Facebook servers will store in LFS also stores on CDN
  • Google Maps
    • Given a random location by latitude and longitude, what are the nearest few restaurants around it.
      • If we go by latitude and longitude it will be 2D query and will not perform
      • Instead divide entire world into grids(to avoid 2D query and make 1D queries)
      • Quad tree
        • root grid with list<allrestaurants>
        • if list size is >100, then break into 4 identical grids with same shape(not size0
        • the grids will be children of root
        • Repeat the process with the child nodes until the restaurant list size<100
      • Add a restaurant
        • start with root and navigate to leaf
        • insert new node for the new restaurant
        • If children size becomes more than 100, then split the node into 4 different nodes and distribute children accordingly
      • Delete a restaurant
        • start with root and navigate to leaf
        • delete the restaurant
      • given a location how to find nearest 50 restaurants
        •  Get the grid corresponding to the location along with Grid coordinate
        • Get mid points of the grid lines.
        • Add a small value to grid coordinate in the direction you want to find the grid details of the neighboring grid in the direction we intended.
        • We have 9 directions to move from one grid to neighboring grids
HLD: Case Study 5 (Design Uber)
  • Requirements
    • Intra-City transportation
    • Find nearest cabs (moving/Stationary)
    • Track cab position in real time
    • Start trip - path from source to destination
    • Live tracking
  • MVP(items to focus in this session)
    • Find nearest cabs (moving/Stationary)
    • Track cab position in real time
  • Shard by City(to reduce size of problem)
    • Say Bangalore (100K cabs)
      • Cab drivers will install Uber Driver App
      • Cab will have states - active, inactive, occupied, 
    • Driver App
      • Sends heatbeat regularly
      • Status of driver location is updated in memory Map
      • Update Quad tree 
      • maintain location & grid boundary in driver app
      • Driver app to inform when grid is changed
    • Say 100K drivers in Bangalore & 100K active users use app at any point of time. How to design such system.
      • Storage
        • InMemory storage of Cabid to currentLocation
          • 100K * 8 * 8 * 8 = 2.4MB(single machine is enough)
        • If the cabs send heart beat every 5 secs, then we get 20K requests per sec that update in memory map(redis can support 100K requests per sec) - (single redis machine is enough)
        • DB
          • Cab Meta data-> MySQL can be used-> MySQL can be used as data is not modified frequently
            • 100K * 10KB ~ 1GB 
          • Users data (10Million users) -> MySQL can be used as data is not modified frequently
            • 10M * 10Kb = 10GB
          • Quad Tree of 100K cabs
            • cabid, lat, long = 24B *100K = 2.4MB
          • When does Quad tree get accessed
            • Grid changes for driver - it could be 5% of total requests (100K * 5%) -> 1K/second
            • when cab status changes
              • 1 cab takes around 8 to 15 rides
              • 4/5 times due to driver inactivity
              • so, it will be 35 state changes per day
              • 100K * 35 / 8hrs * 60 * 60 = 120 per sec
            • find nearest cabs/request cabs
              • 100k/Second requests
            • How to scale reads
              • 100 replicas which will make 1K requests/sec which is manageable
            • How to improve perf further
              • Identify hot spots & cache grid & cabs in a different storage
          • Steps
            • Given location (x,y) find grid in quad tree
            • From grid, get list<Cabs>
            • For Each Cab
              • identify the machine has a WebSocket connection to Driver app and send notification to driver
              • If no response, continue with next Cab
              • If Cab accepts
                • Get Driver details from mysql db and return to User
              • Uber uses hexagonal grids instead of rectangles grids
          • When cab is occupied
            • Start sending location at faster frequency. say every 3secs
            • Driver app will plot the animation of navigation

        • How to about in interview
          • Requirements
          • Estimation of scale
          • Design Goals
          • API design
          • Final Design
  • Bloom Filter
    • A bloom filter is a probabilistic data structure that is based on hashing. It is extremely space efficient and is typically used to add elements to a set and test if an element is in a set. Though, the elements themselves are not added to a set. Instead a hash of the elements is added to the set.
    • Problem being solved
      • When data for NOSQL db is saved in RAM & sorted sets on disk.
      • For any search for any item, system goes to RAM, if not found goes and searches in all sorted sets on disk
      • Searching an element which is not present takes more time.
      • Filter that can say that key doesn't exists with out searching db
        • It should be light weight with out storage
        • It can be approximate
      • Solution
        • Bit array of size N
        • k number of Hash functions which returns values in range of 1..N
        • For a given key, for each hash function get the hash value and set the bit of the value in bit arry
        • When searching an key, check if the all bit values in array for hash functions values are 1 or not
        • If not 1 then key is not present
        • If present, key may or may not be present
    • https://docs.google.com/document/d/1qVUOgfGeFeB8Xh8Zay-NbCjgDUjPVlYaN7_y8S2BTtE/edit
  • Unique ID Generator
    • UUID: 
      • Generate MD5 hash which is a function of timestamp, machineId, ipaddress and so
      • It is alpha numeric id
    • For Numeric Ids & value should be in order of date creation.
      • Single machine generates IDs
        • single point of failure
      • Ranges to each pod
        • will not be incremental as per the date creation
    • How can all shards independently generate id which are Numeric, Unique & incremental
    • <timestamp><machineid><sequence>
    • requests at same time stamp should be unique & request ids coming at later time stamp should be unique and also greater than requests that came prior to that time stamp
    • Epoch secs : Number of secs elapsed after 1970 
    • Epoch ms: Number of ms elapsed after 1970
    • #ms in a year: 3* 10^10
  • Considerations
    • Clock in all machines are in sync to the ms
    • NTP(Network time protocol) ensure clocks on all machines are in sync
  • Twitter
    • 1 bit for sign + 41 bits for timestamp + 5 data center + 5 machine + 12 sequence id

  • Rate Limiter service to avoid DoS - Denial of service attack. How to build & where to build rate limiter. eg: Only allow max 10 users to register in a hour from an IP
    • App design
      • client (adding limiter here is not reliable)
        • load balancer
          • apps servers
            • DB
    • Implement rate api to limit 100 requests per api per user per second
      • Brute Force
        • For every User+api -> have queue/array to maintain request + timestamp for last one sec
        • When new request comes, check if any request in queue is older than one sec. If yes, remove it and consider the new request. If not, reject the new request
        • Drawbacks
          • memory heavy
          • too many in memory operations when the requests are in millions
      • Say rule is 50 request per user per api per 10 secs
        • Maintain count for current 10sec bucket
        • when Y requests come, validate using following formula
          • curr_count+(prev_count * (10-y)/10) is within threshold
          • This is not 100% accurate rather it is approximation
        • This is called sliding window counting approach
      • How to make it work for Distributed systems
        • Redis where all api gateways talk to it
        • maintain user+api+count queues
    • Time Series DB(used for monitoring)
      • Data Dog
      • ODS in facebook
      • It store key, timestamp & value(time for the request to server)
      • Graph is plotted using above info
    • Implement/Design a timeseries db 
      • Input given is key, timestamp & value 
        • user.visit, at time T, with value 1
        • user.bookmark.latency, at T2 with value 15
      • Output desired
        • If key is given, timestamp range is given(t1 - t2), time chunk is given(5 mins) and aggregator func is given)
        • Give output values to plot graph
HLD: Case Studey 6(Design Hotstar) OTT-> Over the Air
  •  Functional Requirements
    • View Videos/movies/tv shows
    • view live content like News/Sports
    • Search
    • Reviews, details of movies, recommendations
    • Type of users
      • Viewers(sport fan, movie buffs)
      • Content providers(upload videos)
    • Playback management(revind, forward)
    • Mobile, web apps (Andriod, IOS)
    • Profile, user accounts
    • Login, Authentication, Authorization
    • subscriptions/payments
    • MVP
    • Functional
      • Watching experience
      • live streaming
        • Sequence of data  (no download required)
        • Streaming is a technique where if we cannot process large amount of data, process in smaller chunks
    • Non-Functional
      • performant
        • deliver content quickly/buffering
        • scalability
        • Highly available/reliable
        • low/medium latency for live streaming
          • time between live happening & video should differ in secs
        • High performance
          • initial load & NO buffering while watching
      • Security
      • Usability
      • Privacy
  • Back of envelop estimation
    • content providers should be able to upload
    • viewers should be able to view
    • number of videos - 10K
    • duration - 1hr on avg
    • other factos in size
      • resolution - number of pixels
      • bit rate- kilo bits/sec
      • frame rate
      • encoding
      • for larger devices like tv
        • 1080/full hd, 5mbps -> 1hr such video will be of size 4GB
        • for descktop and laptop -> 720/p, 3mbps rate->  2gb
        • mobile devices - 480p, 2 mbps, 1gb
      • why not store in single format and serve all devices
        • bandwidth wastage and processing overhead may result in latency
      • 10K * 7GB -> 70 TB(+ 2 back ups) - 210TB
      • stored in BLOB/object storage/cloud storage
    • Data base to maintain Users, Subscription Plans, Devices users are using, Data about user views, 
      • User(id, username, password, email, age,) -> say 1KB per user-> 1Million will take 1GB storage
      • Devices(ip, type, os)
      • Content(actors, description..)
      • ViewerHistory(1000 videos/user, start time, end time, start date, end date
      • 5GB could be db size -> RDBMS can be used
        • This is read heavy system
        • Cache is required(20% data can cached based on 80-20 rule)
        • Master-slave architecture so that reads can go across machines
    • Live Streaming
      • When event is recorded with video camera & micro phone. The camera and micro phone will record in normal format and send to servers over internet.
      • Load balancer
        • Device to distribute traffic to avoid overload of few servers
        • Inconsistent Performance as servers are equally distributed
        • horizontal scalability
      • What happens on server side
        • Encode & Compression: Convert from the normal format to H264 format. 
        • Segmentation: When video content is larger, then break into chunks. eg: break 1GB into 250 chunks of size 4MB
        • Streaming server: The segmented data will be uploaded to Streaming server and maintain sequence of chunks
        • Streaming server will feed the content to Edge Server/CDN servers.
          • CDN- caching, low latency
        • App/Client - webapp/mobile app that have video payer & various clients
        • Video player on client will take care of decodeing
        • Sequencing/Syncronizing the chunks
          • client knows how to sequence.
        • hashing generates checksum for chunk at server side. Client verifies the checksum and rerequests the chunk
    •  API Design
      • /login & /logout
      • /search with parameter keyword
      • /recommendations
      • /content/{id}
      • /content/
      • Adaptive bitrate streaming - Method of delivery of  audio/video content over the internet where quality of stream is adjusted based on viewer device & internet connection so that users will get best experience.
    • When is security important
      • Uploading 
        • At client end - authentication and authorization works
        • Upload Intransit - Using encryption ssl/tls

    • rmtp - low latency & high quality video transmission
  • OTT vs video sharig platforms
    • video sharing platform - youtube
LLD: Remaining Content 2 
  • Coding all models of splitwise and connect to DB
  • Handling inputs from commandline
  • How to settle every user in min time
  • https://docs.google.com/document/d/1JE2zuEWXF3ex5cuEde6rqyMYo338xhGF7oVvQMgW8ls/edit?usp=sharing
  • Client -> Dto -> Controller -> Service -> Repository -> Model
  • CommandRegistry
  • CommandLineRunner
  • run() method
  • How to simplify expenses
    • Calculate how much extra or less someone paid
    • ideal soln
      • any one who paid extra should only receive
      • any one who owes should only pay
    • Divide the people into 2 sets
      • people who paid extra
      • people who paid less
    • Get the person who has paid the least and the person should get most
      • Create transaction between than above with min(a,b) amounts
      • Reduce the amounts with the min amount
      • Continue the process UNTIL 
    • How many txns are needed
      • for N people, at max N-1 txns are needed
    • Data structure to be used Max/min Heaps
    • https://github.com/Naman-Bhalla/splitwiseJune22
    • https://drive.google.com/drive/folders/1Ml376ah2XZdTdablZVmf0zlvh5miGymr
  • HW: go thru recording of split wise 2 & 3 in dashboard
HLD : Case study 7 (Design IRCTC)
  • MVP features(User journey)
    • get trains for source, destination, train
    • get schedule(trainId, date)
    • Auth Registration
    • checkAvailability(trainId, date, class, src, dest)
    • book ticket(trainId, class, date, #seats, addln info passenger, source, destination)
    • Payment Gateway
    • Notification
    • Non-functional
      • concurrency
      • high thru put
      • high availability
  • Estimate Scale
    • #Users(total 10M & per day 100K)
    • # booking(seat) per day - 
    • #trains - 10K
    • #seats per train - 1K
    • #seat bookings per day - 10M
    • space per row - 150Bytes
    • rows in table -10M per day
    • how far ahead can we book - 60days -> 600M entries
      • 600M * 90 = 90GB(Can fit in one machine)
    • Historical data - Move to achieve storage which is low cost & plentiful storage
      • Not frequently accessed
      • Read only - no modifications

  • Design Goals
  • API Design
  • Archive Storage
    • completely separate db
    • get past bookings of user - sql db
    • Analytics - Columnar db
    • audit - tape storage
  • Design Booking Service
    • This is both read and write heavy(Checking availability & writing once booking is confirmed)
    • Highly consistent system cannot depend on eventual consistent system
    • High throughput
      • peak bookings in irctc in one min is 26K and avg is 7k
    • 26k/min - single machine cannot handle -
      • shard data by train_id
  • Replication
    • Prevents data loss
    • Increases read through put
  • Sharding
    • makes data smaller(per shard)
    • increases read/write throughput
HLD: Microservices
  • DesignAmazon/e-commerse appln
    • Functional Requirements
      • Catalog -> list of items organization is selling
      • shopping cart
      • Order mgmt
      • Payment
      • User Mgmt
      • Notification
      • Inventory Mgmt -> how many items are available/stock availability
      • Analytics
      • Search
      • Recommendations
    • Non-Functional Requirements
      • Scalability
      • Availability
      • Reliability
      • Security
      • Performance/Throughput
    • Communication between services
      • Sync
      • Async
      • Using message broker/event driven architecture/message driven
      • Architecture stbles
        • Monolyth
        • layered architecture/component architecture
        • Sync communication
        • Async communication

      •  Micro services
        • communicates with REST/SOAP/gRPS/jsonRPC
        • Independently developed & deployed
        • Order management requires less scalabilty/traffic where as Search requires more scalability
        • can be scaled by customer traffic
        • easily scalable
        • scale up and down easily with less time
        • identify the bottleneck and scale it up
        • webserver and appln servers scale more then databases
        •  Traditionally message brokers run on expensive hardware as a single appln
        • Apache Kafka is a distributed message queue for a message bus
    • Design Micro services architecture
      • Microservices are split based on following priciples
        • Single responsibility
        • Loose Coupling(Maintainable code)
        • Too many microservices will impact performance 
        • High cohesion(all the related things are together)
        • Autonomous
        • stateless
        • Smart endpoints & dumb pipes
          • pipes are integration logic and endpoints are business logic
          • Business logic is within microservices. Integration logic is not within microservice
          • Use Distributed cache instead of appln server cache(sticky cache)
        • Drawbacks in microservice architectures
          • Authentication is implemented in all microservices
          • How to avoid DDos attach(denial of service)
            • User rate limiting
            • need to implement rate limiting in all micro services
          • API Gateway is solution. It is application component which can provide all the functionality of routing to microservices, authentication/authorization. It is kind of Facade for all APIs
          • Facade design pattern - The design pattern hides complexity of a system and provides an interface to the client using which the client can access the system. This comes under structural design pattern.
          • API Gateway functionality
            • routing
            • authentication/authorization
                • It calls security provider and reroute to security provider if not authenticated
                • If authentication is present for one of the service then it can share JWT token to other microservice so that authentication need not be taken care again and again. 
            • rate limiting
            • Caching
            • Protocol transformation
            • Monitoring
            • logging
            • Aggregation of services
        • Components required for building microservices
          • Json-p /JsonB for json processing
          • logging and monitoring
          • API gateway to route to respective server load balancer
          • Loadbalancer to route to service instance with less load
          • Service Mesh -> It is service registration and service discovery
            • provides load balancing(alternative to load balancer)
            • Service Registration: registers the ip address of newly create service instances
            • Service Discovery: Other services can get the other service ip addresses(service discovery)
            • It maintains the list of service rules. What services can access what services
HLD: Popular Inteview Questions 2
  •  Memory Management & Algorithm
    • Data structure and algorithms
      • garbage collection is a dsa problem in general
    • System design
      • Several choices with pros and cons
      • Important for large scale systems
  • Stack Memory
    • LIFO data structure
    • All variables in the function/method are stored in stack (pushed)
    • Primitive data types(small and fixed size)
    • Each thread has its own stack
    • no need for memory management
    • Object variables are stored and referred to Heap for actual values
    • Stored in RAM
  • Heap Memory
    • Complex data types likes lists, maps, sets(objects)
    • Shared across threads
    • Actual objects are stored
    • Stored in RAM
    • How to free up Heap
      • Manual
        • malloc to allocate memory in languages for C.
        • free() to free the memory
        • Pros
          • Good control & high performance
          • Predictability - job performance
        • Cons
          • Memory leak - Memory not freed up by programmer
          • Dangling pointers - pointer pointing to location that is already freedup
          • Double free - 
      • Automatic
        • Pros
          • No manual free, no dangling pointers, no double free
        • Cons
          • Memory leaks can still be there
          • automatic memory mgmt has some overhead
          • lower predictability
      • When to use Manual vs Automatic
        • Automatic will be default choice. It is simple and less work
        • Mission critical systems (that require high predictability - game programming, lowlevel database programming, lowlevel system programming where every cpu cycle counts) require Manual 
          • Pace Maker 
          • Mars rover lander ()
      • RUST
      • Garbage collector in Automatic memory management
        • Functional Requirements
          • Find objects that are no longer used
            • Least recently used
              • Mark & Sweep strategy: Scan entire memory and check which parts are being used
                • Variables in stack are live & can be considered as roots
                • Use graph traversal from roots and find all reachable nodes
                • Pause all jobs when GC is running
                • Start at current stack + meta space + static variables
                • Mark phase: Traverse and mark any reachable objects
                • Sweep: scan entire heap and free space which is not marked
                • Cons
                  • For memory intence appln it will be slow and can take few/many secs

          • free the objects that are no longer used only
        • Non-Functional requirements
          • Min overhead and high performance
          • It should not pause the application
          • High predictable
        • Generational garbage collection
          • Make the garbage collector pauses short
          • It divides the heap into Young & old generations. Young is around 20% of space.
          • New objects are allocated space in new generation.
          • It makes an assumption that if an object survives Garbage collection then it lives for ever
          •  When young fills up, garbage collections kicks in. It will mark only objects in Young and ignore old generation region
          • Any young memory that services is moved to old generation
          • This algo is faster(100ms) as it need to perform mark and sweep only on 20% of memory
          • After many iterations OLD space will also gets filled up. In such case, major GC will run on whole memory.
        • JAVA 
          • Java uses Genarational garbage collector
          • It further divides young space into 3 parts
            • Young 
              • Eden, S0, S1(survival 0 & Survival 1)
            • Old
            • Permanent
          • When Eden is full, survivors are moved to S0
          • When Eden is full, GC will run on Eden+S0 and survivors moved to S1
          • When Eden is full, GC will run on Eden+S1 and survivors moved to S0
          • the process will continue
          • One of S0 or S1 is always empty by moving survivors to other space. Helps in compaction
          • If so or s1 gets filled OR fixed cycles happened survivors after GC will be moved to Old generation
          •  Major GC run only when OLD generation is filled.
        • Reference counting(Python)
          • Every object will store a number that have reference(indegree) to me
          • When the refcount goes to 0 -> garbage collected
          • If there is circular dependency, then refcount never goes to 0
          • Use Mark and sweep periodically.
          • Pros
            • reactive- happens along with appln
            • no need to pause
          • Cons
            • Insufficient - we have to apply mark and sweep at periodic intervals
            • Still be pauses when the variable or object is holding large tree of memory due to cascading memory removal
            • High overhead
              • cpu - every object creation requires ref count to be updated. It should be automic and context switch can't happen
              • Memory - addln memory to maintain reference value
        • tricolor - Truly concurrent

    • Why is memory fragmentation bad
      • Sequential access is not possible
      • Caching happens in blocks. So, when data is together, it get fetched in min number of reads
  • Highlevel design
    • Traffic ananlysis - QPS (avg & peak)
    • Data storage 
    • CPU load = QPS * CPU time per query
  • CPU bound vs I/O bound
    • CPU speed is measured in clock seconds/speed which is measured in Giga Hertz
    • latest intel processor speed is 5.2Ghz and has 24 cores
    • Speed of light is 3 * 10^8 meter/sec
    • Time taken by light to travel one meter is 3 nano secs. 
    • In 3 nano secs, each cpu core can do 15.6 operations(floating point operations) by each core. 
    • https://gist.github.com/jboner/2841832
    • For processor with 24 cores, each core has following
      • ALU -> transistors (clock speed)
      • Registers -> memory made up of flip flops(clock speed)
      • L1 cache ~5 times slower than ALU/registers
      • L2 cache ~14 times slower than L1
      • L3 cache -> shared across all cores -> slower than L2
    • Primary memory -> ~500 times slower than cpu
    • Secondary memory 
    • Web requests are I/O bound requests
    • DSA problems, Image processing are CPU bound tasks
  • Network Load
    • Now a days networks are even faster than cpus with Fiber technology
    • At every junction(to interface with digital device) the light has to be converted to electical signals. This is the primary bottleneck.
    • Network can operate at 10 pb/sec
  • Webservice requests
    • They create one internal thread for every request
    • How many requests can OS handle
    • Concurrent systems - Aync/await
      • If requests go to 100000 then network/disk becomes bottleneck
    • For 24 core system - 96 threads in parallen can scale
    • If the threads are bottleneck, then get rid of threads and move to  Concurrent systems - Aync/await
    • Aync/await programming will not have context switch
  • Stack vs heap
    • stack is per thread and heap is shared
    • Variables/call stack in stack memory. Objects(dynamically allocated memory) are stored in Heaps
    • its clear when to free stack memory. Heap is not straight forward(difficult)
  • tricolor - Truly concurrent
    • Completely reactive with out pauses. Reactive means, garbage collector reacts to any changes to memory allocation. 
    • Garbage collector keeps running as a separate thread
    • In the start, every thing in stack is marked Black, & heap as White
    • Black - is live/reachable. 
      Grey - reachable from roots, still need processing
      white - no references(to be garbage collected)
    • Black is guranteed not to have any references to white. 
      Grey may or may not have any references to white. It needs further processing
    •  Start from Black objects, perform BFS and mark all children as Grey
    • By end of cycle, every think that is reachable is black and not reachable is white
    • When new reference from Black to White happens - Target is marked as Grey
    • When reference is deleted, target is marked in White
    • Pros
      • No pauses
      • perf is predictable
    • Cons
      • Thruput is low
      • Cpu overhead is high
  • Java
    • Tri color mark sweep & Generational GC
  • https://v8.dev/blog/concurrent-marking

Comments

Popular posts from this blog

Low Level Designs

System Design

CS Fundamentals