in blog | Better Simple |
---|---|
original entry | Filtering rows every 5 days with the Django ORM |
When you’re working with timestamped data, eventually you’ll come across the need to fetch records that occur every X days. This post attempts to explain how to do that from within the Django ORM.
If you’re here for code, then this part is for you!
from django.db.models import F, Cast, IntegerField
from django.db.models.functions import ExtractDay, Mod
from django.utils import timezone
from yourproject.models import MyModel
# If you want to fetch data starting with a day other
# than today, this variable needs to change.
today = timezone.now().date()
# Changing the period gives you the ability to fetch
# data every 5, 7 or X days.
period = 5
instances = (
MyModel.objects
.annotate(
date_diff_mod=Mod(
Cast(
ExtractDay(today - F("date_field"),
output_field=IntegerField()
),
period,
)
)
.filter(date_diff_mod=0)
)
I finally proved my math teachers correct, I do use math in the real world!
I haven’t used the modulo operator often in my career, but when I do I feel like I’m pulling a fast one on the universe. The most recent time was when I was writing an API that needed to fetch scores for a user every 7 day or every 30 days.
But for our example let’s be different. Let’s say we’re trying to determine how many logins occur every 5 days.
This implies we are storing each login for a user. A simple version of this could be the following model:1
class Login(models.Model):
created = models.DateTimeField(auto_now_add=True)
created_date = models.DateField()
user = models.ForeignKey(get_user_model(), on_delete=models.CASCADE)
As we deal with periodic events we’ll find that the modulo operator can be helpful. In our case, we’re looking at the number of days between two dates, then filtering on those that are multiples of our period. This means we can find records that have a date 5, 10, 15, 20, 25, etc days from today (or some other fixed date).
days_between % period == 0
Our task is now down to two operations, finding the number of days between two dates and applying a modulo operation.
We can find the number of days between two dates using date subtraction in PostgreSQL. According to the docs:
date - date -> integer Subtract dates, producing the number of days elapsed
Unfortunately in my experience, Django causes PostgreSQL to return an interval
rather than an
integer. To get the integer value for the number of days between the dates, we’d have to
use the EXTRACT
SQL function. Thankfully, this already exists within the Django ORM
as the ExtractDay
expression2.
Putting it all together, we have a way to calculate the number of days between two dates:
ExtractDay(today - F("created_date"))
The second operation, the modulus, can be accomplished using the
Mod
expression.
Running the code:
from django.db.models import F
from django.db.models.functions import ExtractDay, Mod
from .models import Login
today = timezone.now().date()
period = 5
logins = (
Login.objects
.annotate(days_between=ExtractDay(today - F("created_date"))
.annotate(date_diff_mod=Mod("days_between", period))
.filter(date_diff_mod=0)
)
Returns the following error:
ProgrammingError: function mod(double precision, integer) does not exist
LINE 1: SELECT COUNT(*) FROM (SELECT MOD(EXTRACT('created_date' FROM (interva...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
This means that ExtractDay
is returning a double rather than an integer in the SQL. We need
to structure the query to cast that to an integer. Our code now becomes:
from django.db.models import F, Cast, IntegerField
from django.db.models.functions import ExtractDay, Mod
from .models import Login
today = timezone.now().date()
period = 5
logins = (
Login.objects
.annotate(
days_between=Cast(
ExtractDay(today - F("created_date"),
output_field=IntegerField()
)
)
.annotate(date_diff_mod=Mod("days_between", period))
.filter(date_diff_mod=0)
)
Boom! Now we’re fetching logins that occurred every 5 days, starting with today.
Edit 2023-04-05:
It was pointed out that the code examples were using .count()
. I’ve removed that code
and renamed variables so that the focus of the post is on the filtering. Readers can use
their imagination on how to best use a QuerySet
that’s filtered to data with a periodic
date.