Re: MVCC performance issue

Andy Colson <andy@xxxxxxxxxxxxxxx> · Fri, 12 Nov 2010 10:21:55 -0600

On 11/12/2010 7:47 AM, Kyriacos Kyriacou wrote:

SUGGESTION
--------------
1) When a raw UPDATE is performed, store all "new raw versions" either
in separate temporary table space
    or in a reserved space at the end of each table (can be allocated
dynamically) etc

Your use of "raw" is confusing.  I'll just ignore the word.  New row 
versions are already stored in a dynamically allocated spot, right along 
with the other versions of the table.  You are assuming that getting to 
the "correct" version of the row is very slow?  That's only going to be 
the case if you have lots and lots of versions.  And your solution will 
not actually help if there are lots of versions.  While one person who 
is hitting the most recent version might be ok, everyone else will still 
have to search for theirs.  Just as they do now.

2) Any SELECT queries within the same session will be again accessing
the new version of the row

I don't see how this is different from what we currently have.  "same 
session" could have been dropped from your separate table space, and 
then you'd have to go search through previous versions of the row... 
exactly like you do now.

And worse, if you dont want to drop your version of the row from the 
separate table space until you commit/rollback, then no other user can 
start a transaction on that table until your done!  oh no!  You have 
reads and writes blocking each other.

3) Any SELECT queries from other users will still be accessing the old
version

Again.. the same.

4) When UPDATE transaction is ROLLBACK just release the space used in
new temporary location

current layout makes rollback very very fast.

5) When UPDATE transaction is COMMIT then try to LOCK the old version
and overwrite it at the same physical location (NO FRAGMENTATION).

Not sure what you mean by lock, but lock requires single user access and 
slow's things down.  Right now we just bump the "most active transaction 
number", which is very efficient, and requires no locks.  As soon as you 
lock anything, somebody, by definition, has to wait.

6) Similar mechanism can be applied on INSERTS and DELETES
7) In case that transaction was COMMIT, the temporary location can be
either released or archived/cleaned on a pre-scheduled basis. This will
possibly allow the introduction of a TRANSACTION LOG backup mechanism as
a next step.

You are kind of assuming there will only ever be one new transaction, 
and one old transaction.  What about a case where 10 people start a 
transaction, and there are 10 versions of the row?

It seems to me like you are using very long transactions, which is 
causing lots of row versions to show up.  Have you run explain analyze 
on your slow querys to find out the problems?

Have you checked to see if you are cpu bound or io bound?  If you are 
dealing with lots of row versions, I'd assume you are cpu bound.  If you 
check your system though, and see you are io bound, I think that might 
invalidate your assumptions above.

MVCC makes multi user access very nice because readers and writers dont 
block each other, and there are very few locks.  It does come with some 
kinks (gotta vacuum, keep transactions short, you must commit, etc).

select count(*) for example is always going to be slow... just expect 
it, lets not destroy what works well about the database just to make it 
fast.  Instead, find a better alternative so you dont have to run it.

Just like any database, you have to work within MVCC's good points and 
try to avoid the bad spots.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance