On Tue, 8 Sep 2020 at 06:05, Raj <raj@xxxxxxxx> wrote: > > > This would not exactly look like a bug, because the message says "to > > be locked", so at least it's not allowing two workers to lock the same > > tuple. But it seems that the skip-locked mode should not make an error > > out of this, but treat it as the tuple was already locked. Why would > > it want to lock the tuple (representing the job) if another worker has > > already finished his UPDATE of the job to mark it as "done" (which is > > what makes the tuple move to the "completed" partition.) (It's not very clear who wrote the above text since the quote does not mention who the author is and the original email didn't appear to have made it to the list) It's not a bug. I think the quoted text is expecting a bit too much from the database. It does not know that if the tuple is updated and moved to another partition that it can be safely ignored. For all the database knows, the new version of the tuple that's in the new partition still matches the query's WHERE clause and should be locked. If we just go and ignore moved off tuples then we could miss processing tuples that still need to be processed. It's perhaps not impossible to make it work slightly better if it were somehow possible to inform heapam_tuple_lock() that it's operating on a partition and the query queried a partitioned table and that all but 1 partition was pruned with partition pruning. In this case we could be certain the new verison of the tuple can't match the WHERE clause of the SELECT since partition pruning determined that all other partitions don't match the WHERE clause. However, that's: a) a pretty horrid thing to have to teach heapam_tuple_lock() about, and; b) only going to work when 1 partition survives partition pruning, which is pretty horrible since doing ATTACH PARTITION could suddenly cause your queries to fail randomly. If you had 3 partitions, one for "pending", "retry" and "complete", and you wanted to lock all rows that are in a "pending" or "retry" state, then when we encounter an updated row in the "pending" partition, we have no knowledge if it was moved into the "retry" or the "completed" partition. If it's in "retry", then we do want to find it and process it, but if it's in "completed", then it does not match the WHERE clause of the query and we can ignore it. Since we don't know which, we can't make assumptions and must force the user to try again, hence the serialisation failure error. > > Either the SELECT for jobs to do returned a wrong tuple, which was > > already updated, or there is some lapse in the locking. > > > > Either way it would seem to be a waste of time throwing all these > > errors when the tuple should not even have been selected for update > > and locking. > > > > I wonder if anybody knows anything about that issue? Of course you'll > > want to see the DDL and SQL queries, etc. but you can't really try it > > out unless you do some massively parallel magic. I ready mentioned why this cannot work that way [1]. If you have some idea on how to make it work correctly, then it would be interesting to hear. Otherwise, I'm sorry to say that we can't just ignore these tuples because it happens to suit your use case. The solution is just to make the application retry on serialisation failures. David [1] https://www.postgresql.org/message-id/CAApHDvrDH6TQeLxTqnnAnhjrs55ru5g2_QMG=ME+WvD5MmpHQg@xxxxxxxxxxxxxx