An update: following the recommendations on this list, I ran a number of experiments: - I ran with all foreign keys deleted. There was a 4% drop in the rate of deadlocks/transaction, which went from 0.32 per transaction to 0.31. So we still have pretty much the same failure rate. One interesting side effect was that throughput went up by 9%. So maintaining (a lot of) foreign constraints costs us a lot - I ran with max_pred_locks_per_tran as high as 26,400. No difference - I rebuilt the database with fillfactor=15 for all the tables and indexes that are involved in the transactions that fail. This was to see if the problem is PGSQL upgrading row level locks to page level locks. With a low fillfactor, the chances of two transactions landing on the same page is low. We have around 9 rows per data page instead of the original ~60. (The index pages are more tightly packed). I ran a range of thread counts from 5 to 60 for the threads that issue transactions. The failure rate per transaction dropped to around half for the thread count of 5, but that's misleading since with a fillfactor of15, our database size went up by around 6X, reducing the effectiveness of PGSQL and OS disk caches, resulting in a throughput of around half of what we used to see. So the reduced failure rate is just a result of fewer threads competing for resources. When I run with enough threads to max out the system with fillfactor=15 or 100, I get the same failure rates - In case folks hadn't noticed, Ryan Johnson is getting very similar failure rates with dbt-5. So this isn't a case of our having made a silly mistake in our coding of the app or the stored procedures Above experiments were the easy one. I am now working on rewriting the app code and the 6 stored procedures to see if I can execute the whole transaction in a single stored procedure Thanks, Reza > -----Original Message----- > From: Craig Ringer [mailto:craig@xxxxxxxxxxxxxxx] > Sent: Sunday, July 27, 2014 8:58 PM > To: Reza Taheri; pgsql-performance@xxxxxxxxxxxxxx > Subject: Re: High rate of transaction failure with the Serializable > Isolation Level > > On 07/26/2014 02:58 AM, Reza Taheri wrote: > > Hi Craig, > > > >> According to the attached SQL, each frame is a separate phase in the > operation and performs many different operations. > >> There's a *lot* going on here, so identifying possible > >> interdependencies isn't something I can do in a ten minute skim read over > my morning coffee. > > > > You didn't think I was going to bug you all with a trivial problem, > > did you? :-) :-) > > One can hope, but usually in vain... > > > Yes, I am going to have to take an axe to the code and see what pops out. > Just to put this in perspective, the transaction flow and its statements are > borrowed verbatim from the TPC-E benchmark. There have been dozens of > TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that, > although not used in public disclosures for various non-technical reasons, are > used internally in by the DB and server companies. These 3 products, and > perhaps more, were used extensively in the prototyping phase of TPC-E. > > > > So, my hope is that if there is a "previously unidentified interdependency > between transactions" as you point out, it will be due to a mistake we made > in coding this for PGSQL. Otherwise, we will have a hard time convincing all > the council member companies that we need to change the schema or the > business logic to make the kit work with PGSQL. > > Hopefully so. > > Personally I think it's moderately likely that PostgreSQL's much stricter > enforcement of serializable isolation is detecting anomalies that other > products do not, so it's potentially preventing errors. > > It would be nice to have the ability to tune this; sometimes there are > anomalies you wish to ignore or permit. At present it is an all or nothing affair > - no predicate locking (REPEATABLE READ isolation) or strict predicate locking > (SERIALIZABLE isolation). > > I recommend running some of the examples in the SERIALIZABLE > documentation on other products. If they don't fail where they do in Pg, > then the other products either have less strict (and arguably therefor less > correct) serialisable isolation enforcement or they rely on blocking predicate > locks. In the latter case it should be easy to tell because statements will block > on locks where no ordinary row or table level lock could be taken. > > If you do run comparative tests I would be quite interested in seeing the > results. > > -- > Craig Ringer > https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com > /&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjroD2HLPTH > U27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=jyJSmq%2BgXoPIgY%2BNtgswlUg > zHSm45s%2FmevjxBmPKrIs%3D%0A&s=401415fb62d6f76b22bc76469cbefb85 > 8342612f1fea2d359fe2bb3f18cab1ab > PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance