Solving concurrency issue with postgresql advisory locks

 Before we start solving anything, let's understand the problem first.


The problem

We have a website were users can reserve servers, they choose the datacenter, server specs, start date and end date. So our reserve table looks something like this:


create table reserved_servers
(
	datacenter_id integer,
	server_id integer,
	user_id integer,
	start_date timestamp,
	end_date timestamp
);
-- primary key, proper indexes of course will be needed for prod env

And our Django/Flask whatever we use, basically will follow the next steps:

  1. check for overlaps
    1. if not found insert the row
    2.  if found, skip the row.
  2. return response

Let's say that black friday is coming and we have a big discount so that everyone want's to buy servers from us to host cat pictures.

To prevent overlaps we have the SELECT statement and then we do an if on the result, so it shouldn't be possible to get overlaps.
The code below simulates 10 requests made at the same time to our rest api.

#!/usr/bin/env python3

import os
import sys
import threading
from random import randint

import psycopg2


def django_flask_prod_code():
    """"
    This function simulates using POST with RESTApi to reserve a new server
    """
    post_data = {
        'datacenter_id': '223',
        'server_id': '345',
        'start_date': '2021-01-01 14:45',
        'end_date': '2021-01-04 14:45',
        'user_id': randint(0, 9999)
    }
    connection = psycopg2.connect(user="postgres",
                                  password=os.environ['password'],
                                  host=os.environ['host'],
                                  port="5432",
                                  database="postgres")

    cursor = connection.cursor()
    cursor.execute(
        "SELECT true FROM reserved_servers WHERE server_id= %(server_id)s and datacenter_id=%(datacenter_id)s "
        "and start_date <= %(end_date)s and end_date >= %(start_date)s LIMIT 1",
        {
            'server_id': post_data['server_id'],
            'datacenter_id': post_data['datacenter_id'],
            'start_date': post_data['start_date'],
            'end_date': post_data['end_date'],
        })
    record = cursor.fetchone()
    if not record:
        print("Overlap not detected, inserting new row.")
        cursor.execute("INSERT INTO reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) "
                       "VALUES (%(datacenter_id)s, %(server_id)s, %(user_id)s, %(start_date)s, %(end_date)s)", {
                           'server_id': post_data['server_id'],
                           'datacenter_id': post_data['datacenter_id'],
                           'start_date': post_data['start_date'],
                           'end_date': post_data['end_date'],
                           'user_id': post_data['user_id'],
                       })
    else:
        print("Overlap detected, skipping.")
    connection.commit()
    cursor.close()
    connection.close()


def main():
    threads = [threading.Thread(target=django_flask_prod_code) for _ in range(10)]
    for t in threads:
        t.start()
    return 0


if __name__ == '__main__':
    sys.exit(main())

We start 10 threads to simmulate concurrent requests and if we run the program we have a surprise:


It's saying that 10 rows have been inserted, but is this true? Let's check the table:


Let's dig deeper, here are the logs from PostgreSQL

2021-01-14 13:31:49 UTC:REDACTED(58071):postgres@postgres:[13855]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58077):postgres@postgres:[13856]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58076):postgres@postgres:[13859]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58071):postgres@postgres:[13855]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58075):postgres@postgres:[13861]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58070):postgres@postgres:[13858]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58078):postgres@postgres:[13860]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58073):postgres@postgres:[13857]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58072):postgres@postgres:[13863]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58074):postgres@postgres:[13862]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58069):postgres@postgres:[13864]:LOG: statement: BEGIN
2021-01-14 13:31:49 UTC:REDACTED(58077):postgres@postgres:[13856]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58076):postgres@postgres:[13859]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58071):postgres@postgres:[13855]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 3126, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58070):postgres@postgres:[13858]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58078):postgres@postgres:[13860]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58075):postgres@postgres:[13861]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58073):postgres@postgres:[13857]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58072):postgres@postgres:[13863]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58074):postgres@postgres:[13862]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58077):postgres@postgres:[13856]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 6635, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58069):postgres@postgres:[13864]:LOG: statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 13:31:49 UTC:REDACTED(58071):postgres@postgres:[13855]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58076):postgres@postgres:[13859]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 2052, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58070):postgres@postgres:[13858]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 5498, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58078):postgres@postgres:[13860]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 3983, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58073):postgres@postgres:[13857]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 9359, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58075):postgres@postgres:[13861]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 3932, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58072):postgres@postgres:[13863]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 184, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58077):postgres@postgres:[13856]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58074):postgres@postgres:[13862]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 7032, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58069):postgres@postgres:[13864]:LOG: statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 8423, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 13:31:49 UTC:REDACTED(58076):postgres@postgres:[13859]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58078):postgres@postgres:[13860]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58070):postgres@postgres:[13858]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58073):postgres@postgres:[13857]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58075):postgres@postgres:[13861]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58074):postgres@postgres:[13862]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58072):postgres@postgres:[13863]:LOG: statement: COMMIT
2021-01-14 13:31:49 UTC:REDACTED(58069):postgres@postgres:[13864]:LOG: statement: COMMIT

We can see that our SELECT statement has indeed run.

What happened here is that when all the SELECT statements have been run by the 10 threads they all returned 0 rows. We can see that the first COMMIT happens right after the last SELECT runs, so all 10 threads are doing 10 inserts.

How do we prevent this? Well, we can't use SELECT... FOR UPDATE because of this case right here, if 2 request come at the same time and no rows are found there is nothing to lock, we end up with 2 INSERT, meaning overlaps.

Solution: Advisory Locks

We could create an totatly unrelated table with dummy data just to do SELECT...FOR UPDATE but there must be a better way, right?
Well, PostgreSQL calls this way Advisory Locks.

While a flag stored in a table could be used for the same purpose, advisory locks are faster, avoid table bloat, and are automatically cleaned up by the server at the end of the session.  ( from PostgreSQL doc )

We will be using transaction level advisory locks.

How to obtain such lock? Well, just call pg_advisory_xact_lock with the key.

pg_advisory_xact_lock(key bigint)

pg_advisory_xact_lock(key1 int, key2 int)

This is the function signature, what key should we use now? Well, we can use the datacenter_id, but when 2 users will reserve servers on the same datacenter but diferent servers they will wait to accquire the lock, this doesn't sound that great. 

We could use pg_advisory_xact_lock(datacenter_id, server_id), but I'm going to pretend that we have a more complex key that is a string and contains more than 2 columns.

Using fnv algorithm to create the key from a string

Using the fnv algorithm we can calculate an integer from the combined key and that will be used as key when calling pg_advisory_xact_lock.

There is a python package implemeting the fnv algorithm.
Right from the example of the package we have this line:
fnv.hash(data, algorithm=fnv.fnv_1a, bits=64) # uses fnv.fnv_1a algorithm

Is the bits value of 64 good or should we change it?
The key parameter in PostgreSQL is bigint, meaning is using 8 bytes for storage , so 8 bytes to bits is 64, so yes, the value 64 bits is good, we could use 32 bits, but the posibility to get collisions is higher and using more than 64 bits is not supported by PostgreSQL.

To generate our key, we first combine the datacenter_id and the server_id and then use the .hash function from the fnv package. (Did you know that python used a modified fnv algorithm for generating the hashes by the builtin function hash() ? )

    key = f"{post_data['datacenter_id']} {post_data['server_id']}"
    key = fnv.hash(key.encode(), algorithm=fnv.fnv_1a, bits=64)

Some also use sha256 and truncate the result to get only the first 8 bytes but I don't see why waste resources when we can calculate a hash exactly of 8 bytes.

If you need more performance than this, look into pyhash, is implemented in C not in python and has more algorithm available besides fnv.

Testing our solution

We will modify the previous code and make use of advisory locks.

import os
import sys
import threading
from random import randint

import fnv
import psycopg2


def django_flask_prod_code():
    """"
    This function simulates using POST with RESTApi to book a new server
    """
    post_data = {
        'datacenter_id': '223',
        'server_id': '345',
        'start_date': '2021-01-01 14:45',
        'end_date': '2021-01-04 14:45',
        'user_id': randint(0, 9999)
    }
    connection = psycopg2.connect(user="postgres",
                                  password=os.environ['password'],
                                  host=os.environ['host'],
                                  port="5432",
                                  database="postgres")

    cursor = connection.cursor()
    key = f"{post_data['datacenter_id']} {post_data['server_id']}"
    key = fnv.hash(key.encode(), algorithm=fnv.fnv_1a, bits=64)
    cursor.execute("SELECT pg_advisory_xact_lock(%s)", [key])
    cursor.execute(
        "SELECT true FROM reserved_servers WHERE server_id= %(server_id)s and datacenter_id=%(datacenter_id)s "
        "and start_date <= %(end_date)s and end_date >= %(start_date)s LIMIT 1",
        {
            'server_id': post_data['server_id'],
            'datacenter_id': post_data['datacenter_id'],
            'start_date': post_data['start_date'],
            'end_date': post_data['end_date'],
        })
    record = cursor.fetchone()
    if not record:
        print("Overlap not detected, inserting new row.")
        cursor.execute("INSERT INTO reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) "
                       "VALUES (%(datacenter_id)s, %(server_id)s, %(user_id)s, %(start_date)s, %(end_date)s)", {
                           'server_id': post_data['server_id'],
                           'datacenter_id': post_data['datacenter_id'],
                           'start_date': post_data['start_date'],
                           'end_date': post_data['end_date'],
                           'user_id': post_data['user_id'],
                       })
    else:
        print("Overlap detected, skipping.")
    connection.commit()
    cursor.close()
    connection.close()


def main():
    threads = [threading.Thread(target=django_flask_prod_code) for _ in range(10)]
    for t in threads:
        t.start()
    return 0


if __name__ == '__main__':
    sys.exit(main())

Now if we run the program we get our expected result, only one row is insterted.




2021-01-14 14:14:26 UTC:REDACTED(59255):postgres@postgres:[13495]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59255):postgres@postgres:[13495]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59262):postgres@postgres:[13497]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59261):postgres@postgres:[13499]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59257):postgres@postgres:[13496]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59254):postgres@postgres:[13498]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59256):postgres@postgres:[13501]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59253):postgres@postgres:[13502]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59260):postgres@postgres:[13500]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59259):postgres@postgres:[13504]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59258):postgres@postgres:[13503]:LOG:  statement: BEGIN
2021-01-14 14:14:26 UTC:REDACTED(59255):postgres@postgres:[13495]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:26 UTC:REDACTED(59262):postgres@postgres:[13497]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59257):postgres@postgres:[13496]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59261):postgres@postgres:[13499]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59254):postgres@postgres:[13498]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59256):postgres@postgres:[13501]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59253):postgres@postgres:[13502]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59260):postgres@postgres:[13500]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59259):postgres@postgres:[13504]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59258):postgres@postgres:[13503]:LOG:  statement: SELECT pg_advisory_xact_lock(3755351481708176604)
2021-01-14 14:14:26 UTC:REDACTED(59255):postgres@postgres:[13495]:LOG:  statement: INSERT INTO public.reserved_servers (datacenter_id, server_id, user_id, start_date, end_date) VALUES ('223', '345', 1833, '2021-01-01 14:45', '2021-01-04 14:45')
2021-01-14 14:14:26 UTC:REDACTED(59255):postgres@postgres:[13495]:LOG:  statement: COMMIT
2021-01-14 14:14:26 UTC:REDACTED(59262):postgres@postgres:[13497]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:26 UTC:REDACTED(59262):postgres@postgres:[13497]:LOG:  statement: COMMIT
2021-01-14 14:14:26 UTC:REDACTED(59257):postgres@postgres:[13496]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:26 UTC:REDACTED(59257):postgres@postgres:[13496]:LOG:  statement: COMMIT
2021-01-14 14:14:26 UTC:REDACTED(59261):postgres@postgres:[13499]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:26 UTC:REDACTED(59261):postgres@postgres:[13499]:LOG:  statement: COMMIT
2021-01-14 14:14:27 UTC:REDACTED(59254):postgres@postgres:[13498]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:27 UTC:REDACTED(59254):postgres@postgres:[13498]:LOG:  statement: COMMIT
2021-01-14 14:14:27 UTC:REDACTED(59256):postgres@postgres:[13501]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:27 UTC:REDACTED(59256):postgres@postgres:[13501]:LOG:  statement: COMMIT
2021-01-14 14:14:27 UTC:REDACTED(59253):postgres@postgres:[13502]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:27 UTC:REDACTED(59253):postgres@postgres:[13502]:LOG:  statement: COMMIT
2021-01-14 14:14:27 UTC:REDACTED(59260):postgres@postgres:[13500]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:27 UTC:REDACTED(59260):postgres@postgres:[13500]:LOG:  statement: COMMIT
2021-01-14 14:14:27 UTC:REDACTED(59259):postgres@postgres:[13504]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:27 UTC:REDACTED(59259):postgres@postgres:[13504]:LOG:  statement: COMMIT
2021-01-14 14:14:27 UTC:REDACTED(59258):postgres@postgres:[13503]:LOG:  statement: SELECT true FROM reserved_servers WHERE server_id= '345' and datacenter_id='223' and start_date <= '2021-01-04 14:45' and end_date >= '2021-01-01 14:45' LIMIT 1
2021-01-14 14:14:27 UTC:REDACTED(59258):postgres@postgres:[13503]:LOG:  statement: COMMIT

From the logs we can see that only one INSERT is made and this is exatcly what we wanted, this is just one use case of advisory locks.

There is room for improvement, if 2 users reserve the same server from the same datacenter but is not an overlap, they will still wait for the lock, the key could be more complex but it's up to you to decide if is worth the time and effort to come up with a better key.

While setting up the database for this article I found an bug in DataGrip, seems to be fixed now.

Did you know that PostgreSQL is the dbms of the year 2020?

https://db-engines.com/en/blog_post/85

https://www.percona.com/blog/2021/01/07/postgresql-db-engines-coms-database-of-the-year/


Comments