On Fri, Nov 12, 2010 at 03:47:30PM +0200, Kyriacos Kyriacou wrote: > This is my first post in this mailing list and I would like to raise an > issue that in my opinion is causing performance issues of PostgreSQL > especially in a transaction processing environment. In my company we are > using PostgreSQL for the last 8 year for our in-house developed billing > system (telecom). The last few months we started considering moving to > another RDBMS just because of this issue. > > After all these years, I believe that the biggest improvement that could > be done and will boost overall performance especially for enterprise > application will be to improve Multiversion Concurrency Control (MVCC) > mechanism. In theory this seems to be improving performance for SELECT > queries but on tables with very intensive and frequent updates, even > that is not fully true because of the fragmentation of data caused by > MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used > as a buffer) took more than 40min to return a result! VACUUM is not a > solution in my opinion even though after the introduction of autovacuum > daemon situation got much better. > > PROBLEM DECRIPTION > ------------------ > By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a > new copy of the row in a new location. Any SELECT queries within the > same session are accessing the new version of the raw and all other > queries from other users are still accessing the old version. When > transaction is COMMIT PostgreSQL makes the a new version of the row as > the "active" row and expires the old row that remains "dead" and then is > up to VACUUM procedure to recover the "dead" rows space and make it > available to the database engine. In case that transaction is ROLLBACK > then space reserved for the new version of the row is released. The > result is to have huge fragmentation on table space, unnecessary updates > in all affected indexes, unnecessary costly I/O operations, poor > performance on SELECT that retrieves big record sets (i.e. reports etc) > and slower updates. As an example, consider updating the "live" balance > of a customer for each phone call where the entire customer record has > to be duplicated again and again upon each call just for modifying a > numeric value! > > SUGGESTION > -------------- > 1) When a raw UPDATE is performed, store all "new raw versions" either > in separate temporary table space > or in a reserved space at the end of each table (can be allocated > dynamically) etc > 2) Any SELECT queries within the same session will be again accessing > the new version of the row > 3) Any SELECT queries from other users will still be accessing the old > version > 4) When UPDATE transaction is ROLLBACK just release the space used in > new temporary location > 5) When UPDATE transaction is COMMIT then try to LOCK the old version > and overwrite it at the same physical location (NO FRAGMENTATION). > 6) Similar mechanism can be applied on INSERTS and DELETES > 7) In case that transaction was COMMIT, the temporary location can be > either released or archived/cleaned on a pre-scheduled basis. This will > possibly allow the introduction of a TRANSACTION LOG backup mechanism as > a next step. > 8) After that VACUUM will have to deal only with deletions!!! > > > I understand that my suggestion seems to be too simplified and also that > there are many implementation details and difficulties that I am not > aware. > > I strongly believe that the outcome of the discussion regarding this > issue will be helpful. > > Best Regards, > > Kyriacos Kyriacou > Senior Developer/DBA > I cannot speak to your suggestion, but it sounds like you are not vacuuming enough and a lot of the bloat/randomization would be helped by making use of HOT updates in which the updates are all in the same page and are reclaimed almost immediately. Regards, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance