django-planet
March 20, 2023

Caching and Django Rest Framework

in blog Screaming At My Screen
original entry Caching and Django Rest Framework

One of my current projects involves an API. Not surprising in the year 2023. During business hours data is primarily read. Payloads are large and the underlying data model is as complex as it needs to be. Lots of data and a complex data model is usually a good way to make sure your API is slow. And this project was no exception. While there are many different ways how you can approach this, I opted for the one with the smallest change to the codebase possible and the least amount of change required in the existing infrastructure.

Let us start with a small list of reasons why this approach made sense for this particular project. I would not advocate using it as a general design pattern every time you run into performance issues.

  • Data is mostly refreshed during night time. There might be four or five manual edits during day time for every 100,000 entities stored.
  • The project is still in its early MVP / prototyping phase. Not introducing additional service requirements simplifies deployment and workflows.
  • The API is read only.
  • Not introducing additional services and hunting data through different data stores also means debugging potential issues is simple.
  • Optimising database queries helped, denormalising would have helped more. But serialising the amount of data to JSON was also a bottleneck not that easy to overcome.
  • The complexity of the data model and amount of data cannot cannot be reduced without impacting functionality.

We will talk a bit about steps to take this to production later. I obviously did not use any code from the project itself, so the data model in the example might be a bit hard to read. I opted to check the SQLite database into the project as seeding can take some time. All benchmark numbers are highly unscientific with a test server running on my workstation and a curl request being issued from my client.

The Data Model

Let us assume Foo is the main entity we care about. One Foo can have hundreds to thousands of Bar which can have tens to hundreds of Zab which can have tens of Baz.

class Foo(models.Model):
    name = models.CharField(max_length=50)
    pre_serialized = models.JSONField(blank=True, null=True)


class Bar(models.Model):
    foo = models.ForeignKey(Foo, on_delete=models.DO_NOTHING)
    name = models.CharField(max_length=50)


class Baz(models.Model):
    name = models.CharField(max_length=50)


class Zab(models.Model):
    bar = models.ForeignKey(Bar, on_delete=models.DO_NOTHING)
    name = models.CharField(max_length=50)
    bazs = models.ManyToManyField(Baz)

For some instances of Foo we are talking about 1kb of data. For others one to two MB. And having Foo without Bar or Zab makes no sense from a business logic perspective. So the data has to come out of the system in one way or another, be it optimising the Foo API endpoint or requiring multiple calls to fetch resources individually and merging them on the client.

You might have noticed the pre_serialized field on Foo. This is the only field which is not part of the actually business relevant data and already gives away what my solution will be.

We serialize the data and store it. Also known as caching. This works well as the data is not volatile, is mostly written outside of business hours and is slow to query and serialise.

One thing that is important to note is the use of JSONField. You really want to use this over TextField, otherwise additional steps will be required to make the response from your API an actual JSON response.

The Serializer

Our focus will be on the serializer class for Foo. The other serializers involved are standard DRF ModelSerializers you can find here.

class FooSerializer(ModelSerializer):
    bar_set = BarSerializer(many=True)

    class Meta:
        model = Foo
        fields = ("name", "bar_set")

    def to_representation(self, instance):
        if instance.pre_serialized:
            return instance.pre_serialized
        return super().to_representation(instance)

    @property
    def data_skip_cache(self):
        return super().to_representation(self.instance)

I would encourage you to read through Django Rest Frameworks sources for serializers and generic views if you have not already. Understanding how data is handled internally is extremely useful, especially when you are not dealing with an read only API. Simply overwriting methods without knowing the context in which they are used rarely ends well.

In to_representation we check if pre_serialized is exists. If this is the case we return the data as is, otherwise we call to_representation and wait for a lot longer to see data being returned.

data_skip_cache only purpose is to skip the cache check and call the standard to_representation implementation. Without this helper property we would not be able to refresh the cache once it is set as we would always returned cached data.

One field on the data model and seven lines of code - ignoring the helper method to actually populate the cache. That is all it took to make the API usable. Response times for the list endpoint went down from nearly 30s to below 1s. The endpoint for individual entities which took nearly 10s response in less than 300ms. Not perfect, but more than good enough.

sidenote It is no secret that I enjoy working with Django and Django Rest Framework. The whole stack makes me extremely productive. And being able to do something like this with just a few lines of code is exactly the reason why.

I have worked with more than enough systems where bolting caching onto an API would have been a 100 LOC change that makes your life miserable and the system way harder to maintain.

I simply appreciate Django and DRF for what they are and how easy they make my life and I feel like I am not saying this often enough.

If you only want to use cached values for an expensive list endpoint but have individual items always serve live data you can create a subclass of ListSerializer and overwrite its to_representation method.

To Production

Let us talk about things to do before pushing this to production.

Use Redis to store your cached data - or whatever fast key value store you fancy. Your database will not get faster with throwing more data in it. No matter how well Postgres scales, not putting easily avoidable load on the database is a pretty good idea. You should also see a performance increase fetching data from Redis instead of Postgres. Using a combination of model name and primary key usually makes a good combination for a Redis key. (That being said, if you carefully benchmark and monitor your system you might get away with using your database.)

Cache on read. Right now we solely rely on having all data hot in cache. But what if we hit the third line of to_representation? We likely had an issue during pre serialization or it did not yet happen. Instead of continuously degrading API performance you could store the serialized data in your cache and speed up future requests.

Use background workers to warm your cache. If you can keep your cache warm by overwriting Model.save() you either might not have this problem, have a completely different problem this will not solve or have way too forgiving users. Adding ten seconds to each model save will not make for a good user experience. Pushing the work to background workers obviously means you will serve some stale data for a few seconds or longer, depending on how fast your job system can catch up. But the improvements in response time we have seen usually outweigh this drawback.

Abstract the caching implementation if you need this for multiple models. You should not have to change four or five implementations of to_representation and pre_serialize if you decide to move to a different key value store or if you want to add telemetry for cache hits and misses. However...

Do NOT Cache All The Things.

Emphasis on the second word please.

The numbers we have seen above look pretty good if you ask me. But keep in mind that this is a solution for a very specific problem. More often than not all you need to do is overwrite a get_queryset and use prefetch_related and select_related to fix performance issue. Maybe one or two additional database indexes. Any caching implementation adds complexity to your system and when using a key value store an additional point of failure.

All that being said I am pretty happy with this solution. A few lines of code solved all performance problems for foreseeable future.