Re: Fwd: Fwd: Postgres attach partition: AccessExclusive lock set on different tables depending on how attaching is performed

Alvaro Herrera <alvherre@xxxxxxxxxxxxxx> · Wed, 13 Nov 2024 12:49:49 +0100

On 2024-Nov-10, Tom Lane wrote:

> This surprised me a bit too, because I thought we took a
> slightly-less-than-exclusive lock for FK additions or deletions.
> Tracing through it, I find that CloneFkReferencing opens the
> referenced relation with ShareRowExclusiveLock as I expected.
> But then we conclude that we can drop the existing FK enforcement
> triggers for the table being attached.  That causes us to take
> AccessExclusiveLock on the trigger itself, which is fine because
> nobody's really paying attention to that.  But then RemoveTriggerById
> takes AccessExclusiveLock on the trigger's table.  We already had
> that on the table being attached, but not on the other table.

Oooh.

> I wonder whether it'd be all right for RemoveTriggerById to take
> only ShareRowExclusiveLock on the trigger's table.  This seems
> OK in terms of basic semantics: that's enough to lock out
> anything that might want to fire triggers on the table.  However,
> this comment for AlterTableGetLockLevel gives me pause:
> 
>  * Also note that pg_dump uses only an AccessShareLock, meaning that anything
>  * that takes a lock less than AccessExclusiveLock can change object definitions
>  * while pg_dump is running. Be careful to check that the appropriate data is
>  * derived by pg_dump using an MVCC snapshot, rather than syscache lookups,
>  * otherwise we might end up with an inconsistent dump that can't restore.
> 
> I think pg_dump uses pg_get_triggerdef, which is probably not
> safe in these terms.

Looking at pg_get_triggerdef_worker, it is not using syscache but a
systable scan, which uses the catalog snapshot.  A catalog snapshot is
indeed implemented as an MVCC snapshot (so strictly speaking it _is_ an
MVCC snapshot), but the invalidation rules are different from a normal
MVCC snapshot, so AFAIU it's still unsafe.

> An alternative answer might be what Alvaro was muttering about
> the other day: redesign FKs for partitioned tables so that we
> do not have to change the set of triggers when attaching/detaching.

Hmm, I hadn't thought about this idea in those terms, but perhaps we
could reimplement this by not having one trigger for each RI check, but
instead a single trigger which internally determines which FK
constraints exist on the table and does the necessary work in a single
pass.  Then we don't need to add/drop triggers all the time, but we just
add it with the first FK in the table, and remove it when dropping the
last FK.

For tables with many FKs, this could be a win, because we'd only go
through the trigger machinery once.  If a table has both outgoing and
incoming FKs, maybe we could have _one_ single trigger.

(I think this would be orthogonal with the project to stop using SPI for
RI triggers.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/