Scaling Questions Pt: 2

Janu Sung
5 min readJun 10, 2021

This post will provide a series of questions on scaling an application with each question building off the one before it. This set builds of Pt: 1, so take a look at those questions before you try these ones.

This section will look at what happens when we scale globally.

Let’s begin.

Now we have plans to expand internationally. It turns out users in Europe and USA are complaining about slow loading. Why could that be? what would your suggested fix be?

Reason:

The users abroad are too far away from the server it needs to connect to and therefore has to go through several hops to establish a connection to retrieve data.

Solution:

Provide caching on the client side and on the server side to speed up our requests. And change some of our deployment strategies.

But now we’re building all the static assets and putting them in the AWS s3 and also placing a CDN in front of the s3 → allowing the CDN to be the cloud front edge
(by ‘edge’ we mean that whichever server is closest to the user will be used to serve the request of the user).

client → CDN → AWS s3 ← database

So if the user is in the USA, whichever CDN is closest to the user will serve the request for that user. The static assets/data like the HTML css and the javascript will be cached on the CDN. And the second use will be served from the CDN again → its request does not have to come to the server again since the assets are stored on the CDN.

Caching for the dynamic content:

Place Redis in front of the database so before going to the db we query the Redis and we check if we already have the data a client is looking for, if we do we return it from Redis, if not we got to the db and cache the results in Redis for some time in case another call comes in for that data → allowing us to not have to query the db again → which would be expensive.

*Redis is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indices.

So as the user opens the webpage → the client will access the CDN to load all the static assets → HTML, CSS, JS → then it will access the server to interact with the application → any dynamic content will be first queried in Redis (middleman) → if not available it will go to the DB.

After some time we notice our storage is increasing abnormally. It turns out that the logs are taking up most of the storage on the server and causing issues in processing → slow server. How would you resolve this?

Implement logRotations: logs are split by date (or something else) and deleted after some time.

This ensures that the logs do not take up the space abnormally.

After some time we notice that the memory and cpu consumption is increasing, and since we have a single server and we have both the database and the application running on the same server → both are competing for the resources and our application is not scaling well. How would you resolve this issue?

Separate the database from the application by housing them on different servers.

This will allow us to scale both the application and database independently without any resource competition, have higher CPU power and memory and serve more users.

Users love the app → we keep getting more traction.

Now we have a greater load to handle. In addition, we have CRON job running twice a day to backup our DB → we have so much data to process and our application slows down, because our db is busy with the CRON jobs in preparing the backups.

*The software utility cron also known as CRON job is a time-based job scheduler in Unix-like computer operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts) to run periodically at fixed times, dates, or intervals.

What would you suggest to resolve this issue?

To fix this, we first take a look to see what types of actions are being done → are there more reads vs writes? or vice versa?

In this instance there are more reads → therefore we can break up our reads and writes on multiple database servers.

We would isolate writes to be processed by the primary db, and reads to be performed by the secondary db’s (replicas). The secondary db’s would be routinely updated by the primary db’s for any updates.

Whenever we encounter a higher load → we can add more databases to read from.

This process of separating reads and writes is called ‘database replication’.

Also, we would have all the backup CRON jobs performed through the replica db’s instead of the primary db → allowing for the primary to focus solely on writes and updates.

Over time we realize that our single application server is not able to hold up to the load that we are getting. How would you resolve this?

Create multiple application servers and add load balancers to distribute the load evenly.

But now we have logs that are scattered across different application servers → when we need to check logs or debug, we have to go through each of these servers.

So instead we would integrate the ELK stack → Elastic Search, Logstash, Kibana

ELK → instead of writing the logs to the files that we have on the server, we decide to write the logs from the logstash → from there the logs are indexed onto the elastic search and then kibana will provide us the search and visualization of the logs we have on the servers.

That’s it folks! Hope this was a fun exercise and that it helps.

Happy Hacking.

--

--

Janu Sung

Just another one of those dreamers with a sparkle in his eyes.