Scaling Concepts
Concept: Vertical Scaling
What is Vertical scaling ?
vertical scaling or scaling up is a process of increasing the capacity of a single server or resource by adding more power to it.we called this scale up/vertical scaling because in this we are increasing powers to the same servers.
Main dimensions which is increased during this process is:
Virtual CPU, RAM/Memory, Block Storage, Network Bandwidth
.
process we followed:
- Backup Data: In this step we make sure that our existing data should not be lost for this we did nothing because all the data is in block storage and we can use the same block storage with new virtual machine.
- Plan Downtime Window: so here we decide the timings when we will do this process and also notify the customers about that.
- Change Instance Type: Now in this step we came up with the instance type which provide more VPCU, Memory , Network Bandwidth.
- Restart Instance: for changing instance type we stopped the instance and once changed we restart the server.
- Run Test: Now simply run some test(Automation test) and check again the metrics of the server.
- Situation
- Issue
- Task
- Action
Let's understand the situation
so now after scaling things up we can process more requests out number of number of vCPU is increased.
- all the metrics in dashboard is under the threshold limit.
Now let's understand the current issue which we have observed.
first thing which we learned from vertical scaling process is we need downtime and second thing we learned that if the server is down we can't process any request means single point of faliure.
we analyzed from past 3 month window that only a month has a huge request spike.that means we not always required the complete server power and most of the resource is idle.
Mojor drawback of vertical scaling is cost inefficiency that means with increase in resource cost is increasing at exponential rate.
Hence we need a solution which will overcome this limitation.
Hence here comes the next system design concept
named
Horizontal Scaling.
Concept: Horizontal Scaling
What is Horizontal scaling ?
horizontal scaling or scaling out is a method of increasing the capacity or performance of a system by adding more hardware or resources to distribute the load. we called this scale out/horizontal scaling because in this we are increasing adding more servers parallel to existing servers.
process we followed:
- we simply start with creating some general purpose instances,deploy application to all the servers with databases.
- we then added there public IP's in DNS System.
- now as there are more then one server we need to put logging and monitoring stack to all the server.
centralized logging
: fluent bit agent is installed on all the servers which will push logs to cloudwatch.centralized metrics collection
: added prometheus and node exporters to all the server to push metrics toblob store
likes3
.
Issues from vertical scaling which is addressed by horizontal scaling method:
- Downtime
- Single Point of faliure
- Provide Elasticity
- Cost effective
- Situation
- Issue
- Task
- Action
Let's understand the situation
so now we have multiple application server to handle the incoming request in distributed manner.
so whenever we need to scale we just need to add the new application server and add this to
DNS management
dashboard.DNS will route the traffic to application server based on there turn in
round robin fashion
.
Now let's understand the current issue which we have observed.
- first most important issue is everytime we create a new application server we need to
list there IP in DNS dashboard
and it will take some time for DNS to make these changes. - second issue is DNS will not route the traffic based on health of the server.
- we don't have flexibility to route traffic to application server based on logic, it simply route traffic
based on Round robin fashion
.
Hence we need some layer/system which is responsible to balance the load between servers and remove those limitation discussed previously.
Hence here comes the next system design concept
named
Load Balancer.
Concept: Load Balancer
A load balancer is a networking device or software application designed to distribute incoming network traffic across multiple servers or resources. The primary purpose of a load balancer is to ensure that no single server bears too much traffic load
Process we followed for utilizing load balancer in our architecture:
- Health Check Endpoint: as you know that in each application server we have
reverse proxy(nginx)
,backend and frontent
is there. so inreverse-proxy/nginx
config we have to add/health-check
API which can be used by load balancer to determine availability of a server. - Register Instances: register all the instances among which load balancer will route the traffic.
- Configure Load Balancing Algorithm: there are some common algorithms which is used to select the application server to route traffic.
- SSL Certificate: in this step we need to add the
SSL certificate
and useHTTPS
protocol for encrypting the request.
after adding load balancer we can make all application server to be private and make
load_balancer
as public facing and addIP of load balancer
to DNS else remove everything from DNS dashboard.
- Situation
- Issue
- Task
- Action
Let's understand the situation
so far everything works good we have
load balancer
which helps us to manage load among different application servers.so if we have to scale we simply spin new application server run the script to install the application and register this server to notify load balancer to consider this new machine as well.
Now let's understand the current issue which we have observed.
as each application server consist of database as well so when a user hits a request it can be fulfil by any server.
which means if request of
user A
is fulfilled byserver A
then all the data related to that user is stored inserver A
.but now load balancer can route the request to any server that means if request of
user A
is now fulfilled byserver B
did't return any data which belongs touser A
.
Hence we need to separate the database from application servers and all application servers should use same database for consistent data.
Hence here comes the next system design concept
named
Stateless Web Tier.
Stateless Web Tier
From above issue we can learn that web server or server which consist of
application(frontend, backend)
should not store any data/state and hence Web server should be stateless.
Three Tier architecture
From above 3 tier system we can see that
web tier and data tier
is separate thing which promote scalability without any data inconsistency.
- Situation
- Issue
- Task
- Action
Let's understand the situation
so far everything works good we have separated both
application and database
layer.
Now let's understand the current issue which we have observed.
one thing we have observed is that with increase in traffic we can easily scale by adding more application server in the pool.
but what about database we are still using single
database
to fulfil all the database queries.
Hence we need to scale database layer
as well.
Hence here comes the next system design concept
named
Database Scaling.
Database Scaling
The current design faces limitations with a single database, lacking failover support, and struggling to meet database requests from all application servers. To address these challenges, various techniques can be employed to scale the database.
Common Scaling Technique: Database Replication
One of the most widely adopted techniques for scaling databases is Database Replication. This approach involves the creation of a master node (write replica) and multiple slave nodes (read replicas).
-
Master Node (Write Replica):
- This serves as the primary database instance where write operations occur.
- It is responsible for handling data modifications and updates.
-
Slave Nodes (Read Replicas):
- Multiple read replicas are created to distribute the read workload.
- These replicas mirror the data from the master node.
- They can be strategically placed to serve specific application servers or geographical regions.
Failure Scenarios:
-
Master Node Failure:
- If the master node goes down:
- A failover mechanism should be in place to promote one of the read replicas to become the new master.
- The application servers can then redirect write requests to the new master node.
- This ensures minimal downtime and maintains write functionality.
- If the master node goes down:
-
Slave Node Failure:
- If a read replica (slave node) goes down:
- Read traffic can be redirected to other available replicas to ensure continued service.
- Data redundancy in multiple replicas mitigates the impact of a single node failure on read operations.
- Regular monitoring and automated recovery processes can be implemented to replace or repair the failed replica.
- If a read replica (slave node) goes down:
- Situation
- Issue
- Task
- Action
Let's understand the situation
so far everything works good we have successfully scaled both database and application layer.
Now let's understand the current issue which we have observed.
- Performance Improvement: Reducing latency and enhancing response times by serving frequently accessed data directly from the cache.
- Scalability: Offloading read-heavy operations to the cache to handle increased user traffic and data requests, improving overall system scalability.
Hence we need to add cache tier to our system to reduce the unnecessary network calls.
Hence here comes the next system design concept
named
Caching.
Concept: Cache Tier
A cache is like a quick-access storage that keeps important or often-used information in memory. This helps make future requests for that information faster. When a new web page loads, it usually needs data from a database. If we keep asking the database over and over, it can slow things down. But, with a cache, we can avoid this problem.
Now, think of the cache tier as a layer which reduces the burden on the main database and we can scale this independently.
For caching, we have added 2 main components:
- CDN (Content Delivery Network): A CDN is a distributed network of servers that helps deliver content, like images and scripts, closer to users. It reduces latency and speeds up content delivery by storing copies of data in various locations.
- Cache Node: The Cache Node is a dedicated component responsible for storing frequently accessed data in memory. It acts as a quick-access storage, ensuring faster response times by serving data directly from the cache rather than fetching it from the original source, such as a database.
- Situation
- Issue
- Task
- Action
Let's understand the situation
so far everything works good we have successfully scaled both database and application layer.
Additionally, because our data is cached, less network calls are made, lowering costs and improving read performance.
Now let's understand the current issue which we have observed.
- currently both our web tier as well as database tier is scaled according to the increasing demand of the application.
- but currently there is only one cache node to fulfil the read request.
Hence we need to distribute the data present in cache node among multiple shards.
Hence here comes the next system design concept
named
Distributed Caching.
Concept: Distributed Cache and Consistent Hashing
Now currently we have large amount of data which is there in a single cache node.so to scale we will use sharding process in which we will create multiple shards and distribute the original data into multiple shards. now to decide which data should go to which
shard
we will use technique like consistent hashing.check about consistent hashing in below image.
Process we followed
- first we will create a
configuration list
which will consists of information related to all thehost of servers
where shard is present. - second we will install the
cache client
on all web-server which is use to communicate with cache server. - now using consistent hashing we will determine the shard from where we can
read or write
the data.
Note: Cache is a key-value
store like a hash map
and to write or read data we need some key.
- Situation
- Issue
- Task
- Action
Let's understand the situation
so far everything works good we have successfully scaled both database, application and cache layer according to increase in demand.
Now let's understand the current issue which we have observed.
- currently from our monitoring dashboard we have observed that some API's faces timeout and latency of those API's are also high.
- Below are the API's which causes high latency issue:
- Sales Report API: This api is responsible for generating report to show all the sales related data with detailed information.
- Image Scaling of Product: This api is responsible for reducing the size of the image for appropriate performace.so user can upload multiple images for a product and our API will scale all the images and store it is blob store.
- The main reason behind high latency is there are lot of things happening behind these API's.
Hence we need to convert long running API's from synchronous to asynchronous
.
Hence here comes the next system design concept
named
Message Queue.
Concept: Message Queue
A message queue is like a to-do list. Each task you need to do is a message, and the queue is where you keep these messages in order. When you finish one task, you take the next message from the queue and work on that.
Producer is responsible to put message/task in the queue and Consumer is responsible to pick the messages and process.
Process we followed:
- First we integrated message broker / message queue in our system and run this with our application start.
- Adding message queue flow in reports api:
- when merchant hit this api for requesting report we will create a new entry in database with
report_id,merchant_id,status=PENDING
and put all this information inmessage_queue
.
- Adding message queue flow in image upload api:
- when merchant hit this api for uploading image we will create a new entry in database with
merchant_id,image_name,image_binary_data,status=PENDING
and put all this information inmessage_queue
.
- Next we will set up the consumer and also run this with our application start, consumer will
keep an eye / subscribe
on queue. - Consumer will pick message / task metadata from queue and process that message and once done will update the entry in database with status=DONE.
Final Thoughts
- In conclusion, system design is an iterative process where there's often no absolute right or wrong.
- The effectiveness of a design solution depends on various factors, including the specific requirements, constraints, and the dynamic nature of the system.
- It's important to note that 'it depends' is a common refrain in system design. Solutions are not one-size-fits-all, and the optimal design choice can vary based on the context, goals, and the unique characteristics of the problem at hand. Remember, the journey of system design is a learning process, and each decision contributes to a deeper understanding of the system's intricacies
- Throughout the design journey, it's crucial to measure trade-offs carefully, considering factors such as performance, scalability, maintainability, and cost.