"Ed L." <pgsql@xxxxxxxxxxxxx> writes: > Can someone explain why 6508 has a transactionid ExclusiveLock, > but now is waiting on a transactionid ShareLock? That seems > unintuitive. It would seem that if you hold a more exclusive > lock, getting a less exclusive lock would not be a problem. They're not on the same object. Every transaction starts by taking out an exclusive lock on its own XID. (This will never block, because at that instant there is no reason for anyone else to have any lock on that XID.) Subsequently, if there is a need for any transaction to wait for the completion of some specific other transaction, it implements this by trying to acquire share lock on that other transaction's XID. The reason for using share lock is that if several transactions want to wait for the same other transaction, there is no reason for them to block each other: once the other transaction commits, we might as well release them all at the same time. So this is a bit of an abuse of the lock type scheme --- we use ExclusiveLock and ShareLock here because they have the right blocking semantics, not because there's any notion that locking someone else's XID is meaningful in itself. The larger point here is that all this occurs when someone wants to update or lock a specific table row that some other transaction-in-progress already updated or locked. The simple and logically clean way to handle that would be to take out lock manager locks on each individual row modified by any transaction. But that sucks performance-wise, not least because a transaction that changes any large number of rows would quickly exhaust the lock manager's limited shared memory. By transposing block-for-a-row-lock into block-for-a-transaction-ID-lock, we can reduce the number of actively locked objects to something that's practical. And if you want every last gory detail, see the comments for heap_lock_tuple(): * NOTES: because the shared-memory lock table is of finite size, but users * could reasonably want to lock large numbers of tuples, we do not rely on * the standard lock manager to store tuple-level locks over the long term. * Instead, a tuple is marked as locked by setting the current transaction's * XID as its XMAX, and setting additional infomask bits to distinguish this * usage from the more normal case of having deleted the tuple. When * multiple transactions concurrently share-lock a tuple, the first locker's * XID is replaced in XMAX with a MultiTransactionId representing the set of * XIDs currently holding share-locks. * * When it is necessary to wait for a tuple-level lock to be released, the * basic delay is provided by XactLockTableWait or MultiXactIdWait on the * contents of the tuple's XMAX. However, that mechanism will release all * waiters concurrently, so there would be a race condition as to which * waiter gets the tuple, potentially leading to indefinite starvation of * some waiters. The possibility of share-locking makes the problem much * worse --- a steady stream of share-lockers can easily block an exclusive * locker forever. To provide more reliable semantics about who gets a * tuple-level lock first, we use the standard lock manager. The protocol * for waiting for a tuple-level lock is really * LockTuple() * XactLockTableWait() * mark tuple as locked by me * UnlockTuple() * When there are multiple waiters, arbitration of who is to get the lock next * is provided by LockTuple(). However, at most one tuple-level lock will * be held or awaited per backend at any time, so we don't risk overflow * of the lock table. Note that incoming share-lockers are required to * do LockTuple as well, if there is any conflict, to ensure that they don't * starve out waiting exclusive-lockers. However, if there is not any active * conflict for a tuple, we don't incur any extra overhead. regards, tom lane