Re: 8.1.2 locking issues

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 10 Nov 2006 00:53:51 -0500

"Ed L." <pgsql@xxxxxxxxxxxxx> writes:
> Can someone explain why 6508 has a transactionid ExclusiveLock,
> but now is waiting on a transactionid ShareLock?  That seems
> unintuitive.  It would seem that if you hold a more exclusive
> lock, getting a less exclusive lock would not be a problem.

They're not on the same object.  Every transaction starts by taking
out an exclusive lock on its own XID.  (This will never block, because
at that instant there is no reason for anyone else to have any lock
on that XID.)  Subsequently, if there is a need for any transaction
to wait for the completion of some specific other transaction, it
implements this by trying to acquire share lock on that other
transaction's XID.  The reason for using share lock is that if several
transactions want to wait for the same other transaction, there is no
reason for them to block each other: once the other transaction commits,
we might as well release them all at the same time.  So this is a bit
of an abuse of the lock type scheme --- we use ExclusiveLock and
ShareLock here because they have the right blocking semantics, not
because there's any notion that locking someone else's XID is meaningful
in itself.

The larger point here is that all this occurs when someone wants
to update or lock a specific table row that some other
transaction-in-progress already updated or locked.  The simple and
logically clean way to handle that would be to take out lock manager
locks on each individual row modified by any transaction.  But that
sucks performance-wise, not least because a transaction that changes
any large number of rows would quickly exhaust the lock manager's
limited shared memory.  By transposing block-for-a-row-lock into
block-for-a-transaction-ID-lock, we can reduce the number of actively
locked objects to something that's practical.

And if you want every last gory detail, see the comments for
heap_lock_tuple():

 * NOTES: because the shared-memory lock table is of finite size, but users
 * could reasonably want to lock large numbers of tuples, we do not rely on
 * the standard lock manager to store tuple-level locks over the long term.
 * Instead, a tuple is marked as locked by setting the current transaction's
 * XID as its XMAX, and setting additional infomask bits to distinguish this
 * usage from the more normal case of having deleted the tuple.  When
 * multiple transactions concurrently share-lock a tuple, the first locker's
 * XID is replaced in XMAX with a MultiTransactionId representing the set of
 * XIDs currently holding share-locks.
 *
 * When it is necessary to wait for a tuple-level lock to be released, the
 * basic delay is provided by XactLockTableWait or MultiXactIdWait on the
 * contents of the tuple's XMAX.  However, that mechanism will release all
 * waiters concurrently, so there would be a race condition as to which
 * waiter gets the tuple, potentially leading to indefinite starvation of
 * some waiters.  The possibility of share-locking makes the problem much
 * worse --- a steady stream of share-lockers can easily block an exclusive
 * locker forever.  To provide more reliable semantics about who gets a
 * tuple-level lock first, we use the standard lock manager.  The protocol
 * for waiting for a tuple-level lock is really
 *		LockTuple()
 *		XactLockTableWait()
 *		mark tuple as locked by me
 *		UnlockTuple()
 * When there are multiple waiters, arbitration of who is to get the lock next
 * is provided by LockTuple().	However, at most one tuple-level lock will
 * be held or awaited per backend at any time, so we don't risk overflow
 * of the lock table.  Note that incoming share-lockers are required to
 * do LockTuple as well, if there is any conflict, to ensure that they don't
 * starve out waiting exclusive-lockers.  However, if there is not any active
 * conflict for a tuple, we don't incur any extra overhead.

			regards, tom lane