On 2/27/2012 10:29 PM, Andy Colson wrote:
On 02/27/2012 06:55 PM, Dave Vitek wrote:
Hi all,
I have a relation where a tuple typically undergoes a lifecycle
something like:
1) Created
2) Updated maybe thousands of times (no updates to indexed columns
though)
3) Rarely or never modified again
The following query takes about 100 minutes (3 seconds per tuple):
SELECT count(id) from T
(2) is causing a lot of auto vacuum/analyze activity (which is OK).
HOT seems to get used for about 90% of the updates, but there are
enough updates that don't use it to cause issues. I'm using pg
version 9.0.3 on a 32-bit windows xp machine with 3GB of RAM. The
.conf file is using default settings.
Table Info:
n_live_tup 1799
n_dead_tup 191
pg_relation_size 2343mb
indexsize 10mb
toastsize 552kb
toastindexsize 16kb
This reports under 10kb for most tuples:
psql -A -c "select * from T where id=123" | wc -c
auto-vacuum and auto-analyze both ran yesterday with default
settings. There are only one or two new tuples since yesterday. The
database is fairly old (was probably created using pg_restore about
when 9.0.3 came out).
Here is the output from VACUUM VERBOSE:
INFO: vacuuming "public.T"
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.56 sec.
VACUUM
I imagine CLUSTERing the table would make things happier, but I'm
hoping for a permanent solution that avoids periodic downtime.
One thought was to partition so rows that are still changing live in
a separate table from the more stable rows. I imagine the cardinality
of (2) rarely exceeds 10. Can I still get into performance trouble
with a table that small after enough updates? Anyone have other ideas?
- Dave
I'm a little confused... whats the problem?
Sequential scans of this table seem to be pathologically slow for a
table with only 2000 rows.
The following query takes about 100 minutes (3 seconds per tuple):
SELECT count(id) from T
so table T has 18,000 rows? (100 * 60 * 3)
100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples
Table Info:
n_live_tup 1799
Oh, so the table has 1,800 rows?
Correct.
This reports under 10kb for most tuples:
psql -A -c "select * from T where id=123" | wc -c
10 kb? what? So that's one row? And that one row is about 10,000 bytes?
Correct. I mentioned this since pg_relation_size would suggest that one
tuple is using over a meg including overhead, fragmentation, and free
space. So only about 1% of the space is used by the payload, which
might be a red flag?
So you have a large row, who knows how many, and the problem is
"SELECT count(id) from T" is slow?
Correct. I think anything requiring a sequential scan of the table takes
100 minutes to run. However, I wanted to be careful about drawing
conclusions and just present data.
You say:
2) Updated maybe thousands of times (no updates to indexed columns
though)
Are the updates slow? How slow, and how fast to you need them to be?
The updates are perfectly responsive.
Is ID a unique key? Or does "update ... where id=123" update more
than one row?
It's the primary key, it's unique, and it updates a single row.
What does the actual table look like? Looks like it has lots of
indexes, but I'm just guessing! Are you sure you need all those
indexes? updating a row has to update all those indexes.. I assume,
but you give no details.
It has 5 indices, which are warranted, but writes are performing fine --
it's just sequential scans that seem to be pathological. Index scans
are fine too.
Also... in the end, you're on windows, so you probably cant tell me if
you are IO bound, but you probably need a better IO subsystem. And
stop running on raid 5 (which I'm assuming since you don't mention
anything)
I think this is typical consumer hardware from maybe 3 years ago.
Copying a 1G file on the disk takes... 11 minutes when the machine is
under heavy load. There are many instances of this database schema+app,
so making it work well on low end hardware is important.
Wow. Re-reading that I seem to be angry. Please don't take it that
way. Blood sugar is high, makes me angry person. Others will be
along to scold me. Bad Andy! Bad! Go to your room until you find
your happy face.
No problem, I should have been clearer about sequential scans being the
issue.
-Andy
ps: based on you're email gramma, I almost changed all your to you're,
just to see if it annoyed you. Sounds like you work for grammar
technology.
How embarrassing! After writing code all day I tend to have some
natural language issues.
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general