High Level Designs

Steps for System Design
  • Define MVP -> Requirement gathering
  • Estimate Scale wrt #users, #traffic
  • Design Goals(CAP theorem)
  • API Design
  • System Design

1 Billion is 1 Giga byte.

 Databases

  • HBase - Optimized for high volume of writes. -> columnar DB
  • MongoDB - Read heavy systems. Has tunable consistency. Due to the same, it can good for high writes as well - consistency and partitioning tolerance - MongoDB can have a single-point failure - MongoDB sacrifices availability
  • Dynamo DB, MySQL- Read Heavy
  • Cassandra - Cassandra gives up consistency
  • Reddis- Global cache

Queues

  • Kafka/RabbitMQ-  Persistent queue
  • RabbitMQ- allows to push & Kafka only notifies and subscribers have to PULL

 Messaging App

  • Facebook: 20-30 Million messages per day
  • How to scale any product which can support when the users increase from 100 to 100Million 

Messenger

  • MVP
    • Ability to send and receive messages
    • Message History
    • Login with Facebook accounts
    • Contacts/Friends
    • Messages should have max length(250 chars), only text messages
    • Notifications
    • UTF-8 Support
  • Estimation of Scale
    • Facebook users- 3 billion contacts
    • 10 Billion messages/day
      • Storage requirement -> ~250 bytes that includes
        • Message have text, sender, recipient, timestamp, messageId
        • 10Billion * 250 chars ~ 2.5 TB
        • 2.5 TB* 365 * 10yr ~  10 Peta bytes
        • Hence, we cannot store complete data on one system and requires splitting across machines. Splitting across multiple machines is called Sharding
      • Sharding Required
      • Both READ and Write Heavy
  • Design Goals
    • Latency should be low
    • Consistency vs Availability(Cap Theorem)
  • API Design
    • sendMessage(userId, recipient, text, msgId)
    • getConversations(userId, lastTimeStamp, maxCount);
    • getMessages(userId,conversationId, #msgs, offsets);
  • System Design
    • Client(mobile app/browser)
      • Loadbalancer(on facebook)
        • appln servers
        • Sharding based on Conversation
          •  Can't get most recent convesations easily
        • userId based sharding
          • Apps server will have cache along with appln logic - write thru cache to achieve highly consistent system(Sticky session).
          • When message is sent from A to B
            • Messages goes to A cache, followed by A DB.
            • Message goes to B Cache,  followed by B DB. Also, adds to Queues for notification
          • Trade OFF: when apps server goes down, it may take 2-3 seconds for new apps server to get assigned and get the details.
    • Write Request from A to B
      • load balancer 
        • webserver
          • B Cache/apps server -> B DB
          • A Cache/apps server -> A DB
          • Add to persistent queue(Kafka/RabbitMQ)
      • Push Notifications
        • Message is pushed to Machines which are subscribed to Queues
          • Machines maintain Connection details - Devise DB(user, deviceId, machine)
            • Message gets notified

Books

  • https://www.goodreads.com/book/show/23463279-designing-data-intensive-applications

System Types

  • Read Heavy
    • Youtube, Instagram, Netflix
  • Write Heavy
    • Logs, Rating
  • Read/Write Heavy
    • Read Heavy 

SLACK

  • 600 K organizations use slack
  • 5 Step framework
    • Gather Requirements/MVP(P1, P2, P3)
    • Estimate Scale                        //What is the problem
      • Storage(GB/TB/PB)
      • Read>Write(QPS)
    • API Design                          //how to do it
    • Design goals                        //design to remove bottlenecks in 2 &3
      • Latency
      • CAP theorem
    • System Design/High level design
      • Remove all bottlenecks in step2,3,4(implementation details to address 2,3,4)
  •  MVP/Requirements
    • Send/Receive messages
    • Group & 1-1 messages
    • Fetch all messages for a conversation
    • adding multimedia
  • Estimate Scale
    • Daily Active Users
    • similarweb.com
    • competitive analysis
    • 10M users sending 100 messages/day & each message is 1KB size
      • storage for year = 10M * 100 * 1KB * 365= 1 * 10^9 * 365 = 1TB*365 = 365TB
    • QPS
      • If Slack has 10M Daily active users and each users reads about 1000 msgs per day, what will be the number of read queries / sec?
        • (10M * 1000) / (3600*24)  ~ 120K
    • Inferences:
      • One server cannot store all messages
      • One server cannot handle so many reads per sec
  • API Design
    • GetConversations(userId, authToken, #conv, offset)
    • SendMessage(userId, authToken,  conversationId, timestamp, messageBody, attachment(blob_url), msg_id)
    • FetchMsgs(authToken, convId, offset, #msgs, offset)
  • Design Goals
    • Latency: low
    • CAP: CP system(when data is partitioned by network, then 2 options left either consistency or partition.). CAP is concept of distributed system and not just DB system.
  • System Design
    • Client -> DNS Server -> ip address of load balancer ->Application Servers(as1, as2...)
    • Application servers are deployed in private network and cannot be accessed directly
    • Sharding options
      • By MessageId -> Requires to visit all shards to get messages for a single conversation and results in high latency
      • By ConversationId -> Requires to visit multiple data nodes/shards to get messages for a single user and results in high latency
      • By UserId -> cannot be used for group conversations
      • By OrgId -> 
        • TCS: 300K * 100 msgs * 1Kb = 30GB/day = 10TB/year
      • OrgId+Month
        • check the architecture of DRUID
        • Read TPS
          • Read Heavy 
          • Low latency
          • Replication - copies of data for requirements
          • 300K * 1000 = 3500 message/sec 
            • with 1 replica we can reduce load to 1750 messages/sec
          •  how does replication happen
            • data nodes talk to each other using Gossip protocol
          • Caching will further improves the response time
      • client -> load balancer -> Application server(group) -> Global cache -> Database(cluster) -> DB

Social Media Apps

  • Signup -> add friends -> post updates(Text, Images, Comments on posts, likes on Images)
  •  
     

Comments

Popular posts from this blog

Low Level Designs

System Design

CS Fundamentals