> Huh. SERIALIZABLE shouldn't allow two transactions to see no result row > for a given ID and then insert a result row for that ID. One of those > transactions should have to roll back, because otherwise it'd be > incompatible with both serial orderings of the two transactions. Sorry for the misunderstanding - I wasn't suggesting that. Even at the serializable level, W2 can see a row that is unlocked by W1's commit despite W2's snapshot being taken before W1 commits. Carrying on my example, W2 would indeed fail to insert a result(id=1) row. > Conceptually, the thing you really need to lock for this to work is the > result row that isn't there yet, so that some overlapping transaction > doesn't try to lock the same absent thing. Unfortunately, our system for > locking things that aren't there isn't there either. > Some articles on serializability talk about "materialising the conflict", > which means locking some other surrogate thing that "covers" a gap you are > interested in. You might think the job row would do the trick, but since > we don't recheck the condition (that is, recheck that there is no > corresponding result because you don't update the job row), no cigar. I like the concept of "materialising the conflict", that’s a useful way of thinking about it - thanks. > You could also use plain old pg_try_advisory_xact_lock(id), because it just > locks integers, and they're always there. Yeah, I tried this, and might give it another go. A naïve attempt failed for a similar reason. > > SERIALIZABLE deals with that type of magic internally (it locks gaps in > key ranges by predicate-locking a physical btree or hash page that you'd > need to write on to insert a row with a matching key, which is how it > discovers a conflict between one transaction that went looking for key=42 > but didn't find it and another that later writes key=42), but, as > mentioned, SERIALIZABLE doesn't really allow concurrency with this > workload, and you specified that you wanted concurrency with SKIP LOCKED > (but I think you'd have the same problem without it; SKIP LOCKED just gets > you the wrong answer faster). > > There are various ways you could deal with this, but I'd probably go for a > simple scheme where you only have to consult a single row to know if you > can claim it. You could still put the results into a separate table, but > use job.state to find work, and set it to DONE when you insert the result. > It may also be possible to add no new columns but do a dummy update to the > job row to get the join qual rechecked, but I'm not sure if that'd work. > Another reason to add a state column to the job table is so that you can > put a conditional index on it so you can find jobs to be done very > quickly, if you're not planning to remove the ones that are done. Thanks. I rejected the idea of doing a dummy update to the locked row as I wanted to avoid too much extra WAL - the real table originally had quite a few more columns than the toy example, but it's much slimmer now so this could be a viable option.