Cache is king

Picture the scene…

You find yourself with an older web app to look after, and it has had several previous “owners” (devs).

It’s a bit slow in places, and there’s a need to speed it up.

Refactoring the code and/or database queries is always an option, of course, but that can be resource-heavy and hard to justify. Additionally, there is always that low-lying fear of unintended consequences when re-factoring, particularly when you weren’t part of the original dev team.

Imagine then if there was a low-effort, low-risk and (potentially) high-reward option to speeding things up.

There may be, and it’s called an “in-memory data store”, or a “cache”.

Show me the cache

In the context of a web technology stack, a cache is a service that allows you to store data for future web requests to use. If you’re thinking that that sounds a lot like a database, then you’re right. However, a cache is usually in addition to a database, and it’s used for small chunks of data. It may be easiest to think of it as a means of storing results and sharing them between web requests.

In terms of speeding up the performance of a web app, a cache can be used to store computationally-expensive results so that subsequent requests don’t have to do the same expensive calculation that we’ve just done.

One of the most popular caching tools, and the one that I always use, is Redis.

Caching using Redis

You can download redis from the official website, or you can just use your favourite package manager to install it.

For the sake of our demo, we’ll use an Ubuntu box:

sudo apt-get install -y redis

With that done, you should now be able to interact with redis via the command-line. For example, you should be able to associate a key to a value, and then retrieve that value again later using that same key:

$ redis-cli set testkey "hello world"
OK
$ redis-cli get testkey
"hello world"

Right, so that’s jazzy.

Now, assuming that you’re web app isn’t written in bash (!), you’re now going to want to install the appropriate library for your language.

For our demo, we’ll use Python.

First, install python, together with the “pip” package manager.

sudo apt-get install -y python3 python3-pip

Now use pip to install the redis library:

sudo pip3 install redis

OK, now crack open your favourite editor and create a python script like so:

import time

def calc_meaning_of_life():
    time.sleep(5)
    return 42

print(calc_meaning_of_life())

And run it, to convince yourself it’s slow.

$ time python3 redis_demo.py
42

real    0m5.036s
user    0m0.028s
sys     0m0.004s

So, it takes 5 seconds to calculate the meaning of life.

If this was a web app, every single request would be burdened by that 5 second delay.

Now, let’s deploy some redis goodness to help:

import time
import redis

def calc_meaning_of_life():
    time.sleep(5)
    return 42

def get_meaning_of_life():
    REDIS_KEY = "meaning_of_life"
    r = redis.Redis(host='localhost', port=6379, db=0)
    retval = r.get(REDIS_KEY)
    if retval is None:
        retval = calc_meaning_of_life()
        r.set(REDIS_KEY, retval)
    return int(retval)

print(get_meaning_of_life())

What we’ve basically done here is introduce a wrapper function around the expensive function that checks for a cached value before calling the expensive function.

So, let’s run the new version of the script:

$ time python3 /tmp/redis_demo.py
42

real    0m5.180s
user    0m0.137s
sys     0m0.038s

Aaaaaaannnnnddddd that’s not any better. 😞

But, when we run it a second time, we get:

$ time python3 /tmp/redis_demo.py
42

real    0m0.093s
user    0m0.079s
sys     0m0.014s

Which is super-speedy. And every future request will be equally as speedy. So, basically, the first request takes the performance hit, and all subsequent ones have the benefit of the cached value.

And there is only one small fly in the ointment…

Cache Invalidation

The problem with the above code is that, at some point, the cached value is going to be out of date (e.g. the meaning of life may no longer be 42). Which means that, at some point, you’re going to have to invalidate the cache (i.e. delete the value), so that the next web request that comes in is forced to recalculate.

You basically have two options for cache invalidation:

  1. Explicit Invalidation. This means that you need to update your code to explicitly delete the cached value at all the places you know the underlying data has changed. For example, if you had cached the total sales for the day, you would delete the cached value whenever a new sale was made, thus forcing the total sales figure to be re-calculated.
  2. Implicit Invalidation. This means that you simply set the cached value to automatically expire after a certain time period.

Since we’re taking about minimising changes and minimising risk here, implicit invalidation is probably the way to go, as it involves minimum code changes. In fact, all you have to do is add one more line of code to tell redis how long to store the value for and it will take care of the rest.

import time
import redis

def calc_meaning_of_life():
    time.sleep(5)
    return 42

def get_meaning_of_life():
    REDIS_KEY = "meaning_of_life"
    r = redis.Redis(host='localhost', port=6379, db=0)
    retval = r.get(REDIS_KEY)
    if retval is None:
        retval = calc_meaning_of_life()
        r.set(REDIS_KEY, retval)
        r.expire(REDIS_KEY, 60)   ## Expire after 60 secs
    return int(retval)

print(get_meaning_of_life())

This means that the value will be recalculated every minute, but any requests that come in between those one minute intervals will be super-speedy!

Now go and get your cache on.