A user's interpretation (and thoughts) of the WAL replay bug in 9.3

David Johnston <polobo@xxxxxxxxx> · Mon, 17 Mar 2014 13:24:15 -0700 (PDT)

I'm trying to follow the discussion on -hackers and decided I'd try putting
everything I'm reading into my own words.  It is probable some or even all
of the following is simply wrong so please do not go acting on it without
other people providing supporting evidence or comments.  I am a database
user, not a database programmer, but feel I do have a solid ability to
understand and learn and enough experience to adequately represent a
moderately skilled DBA who might see the release notes and scratch their
head.

During the application/restoration (replay) of WAL the modification of
indexes may not be performed correctly resulting in physical rows/keys not
having matching index entries.  Any subsequent attempt to add a duplicate of
the existing physical key will therefore succeed; thus resulting in a
duplicate physical record being present and any future attempt to REINDEX
the unique index column(s) to fail.

A typical replay scenario would first have a row on the PK side of a
relationship updated (though not any of the key columns - or any other
indexed columns - since this is hot-related...?).  If this update takes
sufficiently long (or concurrency is otherwise extremely high) that another
transaction attempted to take a lock on the PK (e.g., so that it could
validate an FK relationship) then a "tuple-lock" operation is performed and
added to the WAL.  The reply of this WAL entry caused the locked index row
to be effectively invisible.

The core alteration in 9.3 that exposed this bug is that in order to improve
concurrency updates to rows that did not hit indexes (i.e., hot-update
capable) allowed other non-updating transactions to simultaneously acquire a
lock sufficient to ensure that the core row elements (those that are
indexed) remained unaltered while not caring whether specific non-indexed
attributes were altered.  Prior to 9.3 update locking was sufficient (i.e.
exclusive) to cause the other sessions to wait for a lock and thus never
hold one simultaneously.  In that situation the WAL replay was effectively
serialized with respect to the transactions and the index entry
modifications were unnecessary but not incorrect.

The most obvious test-and-correct mechanism would be to try and create new
indexes and, if the creation fails, manually remove the duplicate rows from
the table.  A table re-write and/or re-index could only work if no
duplicates entries were made (I am not sure of this line of thought...) so,
for instance, if the PK column is tied to a sequence then no duplicates
would ever have been entered.  The unknown, for me, is whether MVCC
duplication is impacted in which case any update could introduce a
duplicate.

Regardless of the duplicate record issue as soon as the replay happens the
system cannot find the corresponding entry in the index and so any select
queries are immediately at risk of silently returning incorrect results.  I
suppose any related FK constraints would also be broken and those too would
remain invisible until the next forced validation of the constraint.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/A-user-s-interpretation-and-thoughts-of-the-WAL-replay-bug-in-9-3-tp5796432.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general