Re: Serialization errors despite KEY SHARE/NO KEY UPDATE

Olivier Dony <odo+pggen@xxxxxxxx> · Tue, 6 Oct 2015 19:18:36 +0200

On 10/05/2015 11:17 PM, Kevin Grittner wrote:
Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> wrote:
On 10/2/15 11:44 AM, Olivier Dony wrote:
On 10/02/2015 12:28 AM, Jim Nasby wrote:
On 9/29/15 9:47 AM, Olivier Dony wrote:

-- Setup tables
CREATE TABLE users ( id serial PRIMARY KEY,
                      name varchar,
                      date timestamp );
CREATE TABLE orders ( id serial PRIMARY KEY,
                       name varchar,
                       user_id int REFERENCES users (id) );
INSERT INTO users (id, name) VALUES (1, 'foo');
INSERT INTO orders (id, name) VALUES (1, 'order 1');

-- Run 2 concurrent transactions: T1 and T2
                              T1                               T2
|-----------------------------|----------------------------------|
        BEGIN ISOLATION LEVEL
                   REPEATABLE READ;

   UPDATE orders
        SET name = 'order of foo',
                user_id = 1
        WHERE id = 1;

                                 BEGIN ISOLATION LEVEL
                                       REPEATABLE READ;

                                 UPDATE users
                                 SET date = now()
                                 WHERE id = 1;

                                 COMMIT;

   UPDATE orders
   SET name = 'order of foo (2)',
       user_id = 1
   WHERE id = 1;

T1 fails with:
ERROR:  could not serialize access due to concurrent update
CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."users" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"

This isn't a locking failure, it's a serialization failure. I'm not sure
why it's happening though... is there an index on date?

I don't think so. I can reproduce the problem with the queries quoted
above, and the only index that seems to be present is the PK

I'm not sure. Perhaps Kevin Grittner (author of serializable patch) can
shed some light.

This has nothing to do with the serializable patch[1], but it is
related to the new multixact code in 9.3.  I thought I remembered
that this was a regression from 9.2 behavior which was fixed in a
patch after being reported, and sure enough tests of the tips of
branches showed that prior to 9.3 T2 blocked until T1 committed,
rather than generating a serialization failure, while on 9.5 and
master there was no blocking or error.

Using `git bisect` to see where this was fixed shows:

commit 05315498012530d44cd89a209242a243374e274d
Author: Alvaro Herrera <alvherre@xxxxxxxxxxxxxx>
Date:   Tue Jul 29 15:41:06 2014 -0400

     Avoid uselessly looking up old LOCK_ONLY multixacts

     Commit 0ac5ad5134f2 removed an optimization in multixact.c that skipped
     fetching members of MultiXactId that were older than our
     OldestVisibleMXactId value.  The reason this was removed is that it is
     possible for multixacts that contain updates to be older than that
     value.  However, if the caller is certain that the multi does not
     contain an update (because the infomask bits say so), it can pass this
     info down to GetMultiXactIdMembers, enabling it to use the old
     optimization.

     Pointed out by Andres Freund in 20131121200517.GM7240@xxxxxxxxxxxxxxxxx

That's exactly it, thanks a lot for tracking down the root cause!
I understand (a little bit) better now.

I don't know whether, after being in 9.5 for over a year it could
be considered for a back-patch to 9.3 and 9.4; I leave that to
Álvaro and others who have been repairing the multixact problems to
determine.

From what I understand of the original discussion between Andres, Álvaro and 
Tom, a back-patch seemed desirable because this issue has performance 
consequences. I have to add that in our case it also causes spurious 
serialization failures that were not present in 9.2 and aren't either in 9.5.
The backwards-compatibility problem of the patch might however apply to both 
9.3 and 9.4 now...

For what it's worth, our system uses many long-running transactions (background 
jobs, batch data imports, etc.) that are frequently interrupted and rolled back 
by micro-transactions coming from users who just happen to update minor data on 
their records (such as their last login date).

Having to replay those long transactions is particularly expensive and they 
stand a good chance of being interrupted again during the replay, etc.

We would happily skip the micro-transactions (as a perf workaround) if there 
was a way to detect this situation, but we couldn't find a way to do that in 
9.3. <9.3 we used SELECT FOR UPDATE NOWAIT to guard similar cases.

If there is any way I could help to make the back-patch happen, please let me know!

--
Olivier

[1]  The serializable patch has no effect if there are not
serializable transactions in use, and it doesn't generate errors
"due to concurrent update" in any new situations.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general