Re: strategies for dealing with frequently updated tables

Dave Vitek <dvitek@xxxxxxxxxxxxxx> · Tue, 28 Feb 2012 15:09:43 -0500

On 2/27/2012 10:29 PM, Andy Colson wrote:
On 02/27/2012 06:55 PM, Dave Vitek wrote:
Hi all,

I have a relation where a tuple typically undergoes a lifecycle 
something like:
1) Created
2) Updated maybe thousands of times (no updates to indexed columns 
though)
3) Rarely or never modified again

The following query takes about 100 minutes (3 seconds per tuple):
SELECT count(id) from T

(2) is causing a lot of auto vacuum/analyze activity (which is OK). 
HOT seems to get used for about 90% of the updates, but there are 
enough updates that don't use it to cause issues. I'm using pg 
version 9.0.3 on a 32-bit windows xp machine with 3GB of RAM. The 
.conf file is using default settings.

Table Info:
n_live_tup 1799
n_dead_tup 191
pg_relation_size 2343mb
indexsize 10mb
toastsize 552kb
toastindexsize 16kb

This reports under 10kb for most tuples:
psql -A -c "select * from T where id=123" | wc -c

auto-vacuum and auto-analyze both ran yesterday with default 
settings. There are only one or two new tuples since yesterday. The 
database is fairly old (was probably created using pg_restore about 
when 9.0.3 came out).

Here is the output from VACUUM VERBOSE:
INFO: vacuuming "public.T"
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.56 sec.
VACUUM

I imagine CLUSTERing the table would make things happier, but I'm 
hoping for a permanent solution that avoids periodic downtime.

One thought was to partition so rows that are still changing live in 
a separate table from the more stable rows. I imagine the cardinality 
of (2) rarely exceeds 10. Can I still get into performance trouble 
with a table that small after enough updates? Anyone have other ideas?

- Dave

I'm a little confused... whats the problem?
Sequential scans of this table seem to be pathologically slow for a 
table with only 2000 rows.

The following query takes about 100 minutes (3 seconds per tuple):
SELECT count(id) from T

so table T has 18,000 rows?   (100 * 60 * 3)
100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples

Table Info:
n_live_tup 1799

Oh, so the table has 1,800 rows?
Correct.

This reports under 10kb for most tuples:
psql -A -c "select * from T where id=123" | wc -c

10 kb?  what? So that's one row?  And that one row is about 10,000 bytes?
Correct.  I mentioned this since pg_relation_size would suggest that one 
tuple is using over a meg including overhead, fragmentation, and free 
space.  So only about 1% of the space is used by the payload, which 
might be a red flag?

So you have a large row, who knows how many, and the problem is 
"SELECT count(id) from T" is slow?
Correct. I think anything requiring a sequential scan of the table takes 
100 minutes to run.  However, I wanted to be careful about drawing 
conclusions and just present data.

You say:

2) Updated maybe thousands of times (no updates to indexed columns 
though)

Are the updates slow?  How slow, and how fast to you need them to be?
The updates are perfectly responsive.
  Is ID a unique key? Or does "update ... where id=123" update more 
than one row?
It's the primary key, it's unique, and it updates a single row.

What does the actual table look like?  Looks like it has lots of 
indexes, but I'm just guessing!  Are you sure you need all those 
indexes?  updating a row has to update all those indexes.. I assume, 
but you give no details.
It has 5 indices, which are warranted, but writes are performing fine -- 
it's just sequential scans that seem to be pathological.  Index scans 
are fine too.

Also... in the end, you're on windows, so you probably cant tell me if 
you are IO bound, but you probably need a better IO subsystem.  And 
stop running on raid 5 (which I'm assuming since you don't mention 
anything)
I think this is typical consumer hardware from maybe 3 years ago.  
Copying a 1G file on the disk takes... 11 minutes when the machine is 
under heavy load.  There are many instances of this database schema+app, 
so making it work well on low end hardware is important.

Wow.  Re-reading that I seem to be angry.  Please don't take it that 
way.  Blood sugar is high, makes me angry person.  Others will be 
along to scold me.  Bad Andy!  Bad!  Go to your room until you find 
your happy face.
No problem, I should have been clearer about sequential scans being the 
issue.

-Andy
ps: based on you're email gramma, I almost changed all your to you're, 
just to see if it annoyed you.  Sounds like you work for grammar 
technology.
How embarrassing!  After writing code all day I tend to have some 
natural language issues.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general