On 10/05/2015 11:17 PM, Kevin Grittner wrote:
Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> wrote:
On 10/2/15 11:44 AM, Olivier Dony wrote:
On 10/02/2015 12:28 AM, Jim Nasby wrote:
On 9/29/15 9:47 AM, Olivier Dony wrote:
-- Setup tables
CREATE TABLE users ( id serial PRIMARY KEY,
name varchar,
date timestamp );
CREATE TABLE orders ( id serial PRIMARY KEY,
name varchar,
user_id int REFERENCES users (id) );
INSERT INTO users (id, name) VALUES (1, 'foo');
INSERT INTO orders (id, name) VALUES (1, 'order 1');
-- Run 2 concurrent transactions: T1 and T2
T1 T2
|-----------------------------|----------------------------------|
BEGIN ISOLATION LEVEL
REPEATABLE READ;
UPDATE orders
SET name = 'order of foo',
user_id = 1
WHERE id = 1;
BEGIN ISOLATION LEVEL
REPEATABLE READ;
UPDATE users
SET date = now()
WHERE id = 1;
COMMIT;
UPDATE orders
SET name = 'order of foo (2)',
user_id = 1
WHERE id = 1;
T1 fails with:
ERROR: could not serialize access due to concurrent update
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."users" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
This isn't a locking failure, it's a serialization failure. I'm not sure
why it's happening though... is there an index on date?
I don't think so. I can reproduce the problem with the queries quoted
above, and the only index that seems to be present is the PK
I'm not sure. Perhaps Kevin Grittner (author of serializable patch) can
shed some light.
This has nothing to do with the serializable patch[1], but it is
related to the new multixact code in 9.3. I thought I remembered
that this was a regression from 9.2 behavior which was fixed in a
patch after being reported, and sure enough tests of the tips of
branches showed that prior to 9.3 T2 blocked until T1 committed,
rather than generating a serialization failure, while on 9.5 and
master there was no blocking or error.
Using `git bisect` to see where this was fixed shows:
commit 05315498012530d44cd89a209242a243374e274d
Author: Alvaro Herrera <alvherre@xxxxxxxxxxxxxx>
Date: Tue Jul 29 15:41:06 2014 -0400
Avoid uselessly looking up old LOCK_ONLY multixacts
Commit 0ac5ad5134f2 removed an optimization in multixact.c that skipped
fetching members of MultiXactId that were older than our
OldestVisibleMXactId value. The reason this was removed is that it is
possible for multixacts that contain updates to be older than that
value. However, if the caller is certain that the multi does not
contain an update (because the infomask bits say so), it can pass this
info down to GetMultiXactIdMembers, enabling it to use the old
optimization.
Pointed out by Andres Freund in 20131121200517.GM7240@xxxxxxxxxxxxxxxxx
That's exactly it, thanks a lot for tracking down the root cause!
I understand (a little bit) better now.
I don't know whether, after being in 9.5 for over a year it could
be considered for a back-patch to 9.3 and 9.4; I leave that to
Álvaro and others who have been repairing the multixact problems to
determine.
From what I understand of the original discussion between Andres, Álvaro and
Tom, a back-patch seemed desirable because this issue has performance
consequences. I have to add that in our case it also causes spurious
serialization failures that were not present in 9.2 and aren't either in 9.5.
The backwards-compatibility problem of the patch might however apply to both
9.3 and 9.4 now...
For what it's worth, our system uses many long-running transactions (background
jobs, batch data imports, etc.) that are frequently interrupted and rolled back
by micro-transactions coming from users who just happen to update minor data on
their records (such as their last login date).
Having to replay those long transactions is particularly expensive and they
stand a good chance of being interrupted again during the replay, etc.
We would happily skip the micro-transactions (as a perf workaround) if there
was a way to detect this situation, but we couldn't find a way to do that in
9.3. <9.3 we used SELECT FOR UPDATE NOWAIT to guard similar cases.
If there is any way I could help to make the back-patch happen, please let me know!
--
Olivier
[1] The serializable patch has no effect if there are not
serializable transactions in use, and it doesn't generate errors
"due to concurrent update" in any new situations.
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general