django-planet
April 6, 2023

Filtering rows every 5 days with the Django ORM

in blog Better Simple
original entry Filtering rows every 5 days with the Django ORM

When you’re working with timestamped data, eventually you’ll come across the need to fetch records that occur every X days. This post attempts to explain how to do that from within the Django ORM.

If you’re here for code, then this part is for you!


from django.db.models import F, Cast, IntegerField
from django.db.models.functions import ExtractDay, Mod
from django.utils import timezone

from yourproject.models import MyModel

# If you want to fetch data starting with a day other
# than today, this variable needs to change.
today = timezone.now().date()
# Changing the period gives you the ability to fetch 
# data every 5, 7 or X days.
period = 5

instances = (
    MyModel.objects
    .annotate(
        date_diff_mod=Mod(
            Cast(
                ExtractDay(today - F("date_field"),
                output_field=IntegerField()
            ),
            period,
         )
     )
    .filter(date_diff_mod=0)
)

I finally proved my math teachers correct, I do use math in the real world!

I haven’t used the modulo operator often in my career, but when I do I feel like I’m pulling a fast one on the universe. The most recent time was when I was writing an API that needed to fetch scores for a user every 7 day or every 30 days.

But for our example let’s be different. Let’s say we’re trying to determine how many logins occur every 5 days.

This implies we are storing each login for a user. A simple version of this could be the following model:1

class Login(models.Model):
    created = models.DateTimeField(auto_now_add=True)
    created_date = models.DateField()
    user = models.ForeignKey(get_user_model(), on_delete=models.CASCADE)

As we deal with periodic events we’ll find that the modulo operator can be helpful. In our case, we’re looking at the number of days between two dates, then filtering on those that are multiples of our period. This means we can find records that have a date 5, 10, 15, 20, 25, etc days from today (or some other fixed date).

days_between % period == 0

Our task is now down to two operations, finding the number of days between two dates and applying a modulo operation.

We can find the number of days between two dates using date subtraction in PostgreSQL. According to the docs:

date - date -> integer Subtract dates, producing the number of days elapsed

Unfortunately in my experience, Django causes PostgreSQL to return an interval rather than an integer. To get the integer value for the number of days between the dates, we’d have to use the EXTRACT SQL function. Thankfully, this already exists within the Django ORM as the ExtractDay expression2. Putting it all together, we have a way to calculate the number of days between two dates: ExtractDay(today - F("created_date"))

The second operation, the modulus, can be accomplished using the Mod expression.

Running the code:

from django.db.models import F
from django.db.models.functions import ExtractDay, Mod

from .models import Login

today = timezone.now().date()
period = 5

logins = (
    Login.objects
    .annotate(days_between=ExtractDay(today - F("created_date"))
    .annotate(date_diff_mod=Mod("days_between", period))
    .filter(date_diff_mod=0)
)

Returns the following error:

ProgrammingError: function mod(double precision, integer) does not exist
LINE 1: SELECT COUNT(*) FROM (SELECT MOD(EXTRACT('created_date' FROM (interva...
                                 	^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

This means that ExtractDay is returning a double rather than an integer in the SQL. We need to structure the query to cast that to an integer. Our code now becomes:


from django.db.models import F, Cast, IntegerField
from django.db.models.functions import ExtractDay, Mod

from .models import Login

today = timezone.now().date()
period = 5

logins = (
    Login.objects
    .annotate(
        days_between=Cast(
            ExtractDay(today - F("created_date"),
            output_field=IntegerField()
        )
     )
    .annotate(date_diff_mod=Mod("days_between", period))
    .filter(date_diff_mod=0)
)

Boom! Now we’re fetching logins that occurred every 5 days, starting with today.

Edit 2023-04-05:

It was pointed out that the code examples were using .count(). I’ve removed that code and renamed variables so that the focus of the post is on the filtering. Readers can use their imagination on how to best use a QuerySet that’s filtered to data with a periodic date.

  1. This model is denormalized to avoid having to cast the datetime into a date and dealing with timezones. 

  2. I’m curious if I could use the alias function instead of annotate to reduce the size of the SELECT clause in the SQL query.