Testing Postgres race conditions with synchronization barriers

(lirbank.com)

34 points | by lirbank 2 hours ago

5 comments

anvevoice 44 minutes ago

One pattern I've found useful alongside this: Postgres advisory locks (pg_advisory_xact_lock) for cases where the contention isn't row-level but logic-level. For example, if two requests try to create the "first" resource of a type - there's no existing row to SELECT FOR UPDATE against.
Advisory locks let you serialize on an arbitrary key (like a hash of the entity type + parent ID) without needing a dummy row or separate lock table. They auto-release on transaction end, so no cleanup.
The barrier testing approach from the article would work nicely here too - inject the barrier between acquiring the advisory lock and the subsequent insert, then verify the second transaction blocks until the first commits.

[-]
- lirbank 39 minutes ago
  
  Nice - that's a good case for barriers too. When there's no row to SELECT FOR UPDATE against, you'd inject the barrier after acquiring the advisory lock and verify the second transaction blocks until the first commits.
throwaway2ge5hg 35 minutes ago

Postgres has SERIALIZABLE transaction isolation level. Just use it and then you never have to worry about any of these race conditions.
And if for some reason you refuse to, then this "barrier" or "hooks" approach to testing will in practice not help. It requires you to already know the potential race conditions, but if you are already aware of them then you will already write your code to avoid them. It is the non-obvious race conditions that should scare you.
To find these, you should use randomized testing that runs many iterations of different interleavings of transaction steps. You can build such a framework that will hook directly into your individual DB query calls. Then you don't have to add any "hooks" at all.
But even that won't find all race condition bugs, because it is possible to have race conditions surface even within a single database query.
You really should just use SERIALIZABLE and save yourself all the hassle and effort and spending hours writing all these tests.

[-]
- lirbank 29 minutes ago
  
  Good call, SERIALIZABLE is a strong option - it eliminates a whole class of bugs at the isolation level. The trade-off is your app needs to handle serialization failures with retry logic, which introduces its own complexity. That retry logic itself needs testing, and barriers work for that too. On randomized testing - that actually has the same limitation you mentioned about barriers: you need to know where to point it. And without coordination, the odds of two operations overlapping at exactly the wrong moment are slim. You'd need enormous pressure to trigger the race reliably, and even then a passing run doesn't prove much. Barriers make the interleaving deterministic so a pass actually means something.
scottlamb 50 minutes ago

It'd be interesting to see a version of this that tries all the different interleavings of PostgreSQL operations between the two (or N) tasks. https://crates.io/crates/loom does something like this for Rust code that uses synchronization primitives.

[-]
- lirbank 43 minutes ago
  
  Interesting! The barrier approach is more targeted: you specify the exact interleaving you want to test rather than exploring all of them. Trade-off is you need to know which interleavings matter, but you get deterministic tests that run against a real database instead of a simulated runtime. Exploring exhaustive interleaving testing against a real Postgres instance could be a fun follow-up - I'd be curious if it's practical.
haliliceylan 1 hour ago

Thats not postgresql problem, thats your code
IMHO you should never write code like that, you can either do UPDATE employees SET salary = salary + 500 WHERE employee_id = 101;
Or if its more complex just use STORED PROCEDURE, there is no point of using database if you gonna do all transactional things in js

[-]
- lirbank 1 hour ago
  Here's a real-world example where atomic updates aren't an option - an order status transition that reads the current status from one table, validates the transition, and inserts into another:
  await db().transaction(async (tx) => { await hooks?.onTxBegin?.();
```
  const [order] = await tx.select().from(orders)
    .where(eq(orders.id, input.id))
    .for("update");

  const [status] = await tx.select().from(orderStatuses)
    .where(eq(orderStatuses.orderId, input.id))
    .orderBy(desc(orderStatuses.createdAt))
    .limit(1);

  if (input.status === status.code)
    throw new Error("Status already set");

  await tx.insert(orderStatuses).values({ ... });
```
  });
  You need the transaction + SELECT FOR UPDATE because the validation depends on current state, and two concurrent requests could both pass the duplicate check. The hooks parameter is the barrier injection point from the article - that's how you test that the lock actually prevents the race.
  [-]
  - erpellan 58 minutes ago
    
    The standard pattern to avoid select for update (which can cause poor performance under load) is to use optimistic concurrency control.
    Add a numeric version column to the table being updated, read and increment it in the application layer and use the value you saw as part of the where clause in the update statement. If you see ‘0 rows updated’ it means you were beaten in a race and should replay the operation.
    
    [-]
    - tux3 40 minutes ago
      
      I don't think such a broad recommendation will be good for most people, it really depends.
      Optimistic updates looks great when there is no contention, and they will beat locking in a toy benchmark, but if you're not very careful they can cause insane amplification under load.
      It's a similar trap as spinlocks. People keep re-discovering this great performance hack that avoids the slow locks in the standard. And some day the system has a spike that creates contention, and now you have 25 instances with 24 of them spinning like crazy, slowing to a crawl the only one that could be making progress.
      It's possible to implement this pattern correctly, and it can be better in some specific situations. But a standard FOR UPDATE lock will beat the average badly implemented retry loop nine times out of ten.
    - lirbank 53 minutes ago
      
      Good point. The barrier pattern from the article applies to both approaches - whether you're using pessimistic locks or optimistic version checks, it's good to verify that the concurrency handling actually works. Barriers let you test that your version check correctly rejects the stale update, the same way they test that your lock prevents the race.
  - codys 50 minutes ago
    Seems you could use a single SQL statement for that particular formulation. Something like this, using CTEs is possible, but alternately one can reformat them as subqueries. (note: not sure how the select of orders is intended to be used, so the below doesn't use it, but it does obtain it as an expression to be used)
```
    WITH
     o AS (
      SELECT FROM orders
      WHERE orders.id = $1
     ),
     os AS (
      SELECT FROM orderStatuses
      WHERE orderStatuses.orderId = $1
      ORDER BY DESC orderStatuses.createdAt
      LIMIT 1
     )
     INSERT INTO orderStatuses ...
     WHERE EXISTS (SELECT 1 FROM os WHERE os.code != $2)
     RETURNING ...something including the status differ check...
```
    Does something like this work with postgres's default behavior?
    [-]
    - lirbank 35 minutes ago
      
      Absolutely - if you can express the whole operation as a single atomic statement, that's the best outcome. No locks needed, no race to test for. The article is about what comes next: when the logic can't collapse into one query, how do you verify your concurrency handling actually works?
- lirbank 1 hour ago
  
  Fair point - atomic updates like SET salary = salary + 500 sidestep the race condition entirely for simple cases. The examples are intentionally simplified to isolate the concurrency behavior. The barrier pattern is more relevant when you have read-modify-write operations that involve application logic between the read and the write - those can't always collapse into a single UPDATE.
HackerThemAll 39 minutes ago

Javascript developers learn kindergarten basics of transactions and SQL. LOL. Is it the camp "we don't need a degree to be programmers"?