Re: Performance with many updates

"Tomas Vondra" <tv@xxxxxxxx> · Mon, 29 Aug 2011 22:38:06 +0200

On 29 Srpen 2011, 11:13, Tasdassa Asdasda wrote:
> Hi. I have a table called work (id bigserial, userid int4, kind1 enum,
> kind2 enum, kind3 enim, value bigint, modified timestamp)
> Table will have about 2*10^6 rows (at same time - overall it can have
> higher IDs but old records are eventually deleted (moved to separate
> archive table) therefore the IDs can grow huge). After insert on table
> work, every row will be updated (value will be reduced, till value = 0
> (work is finished)). For each row there will be from 1 to
> maybe 10 updates on two cells (value, modified). After work is completed
> (value = 0) it's record will be moved to archive table.
> kind1 is an enum with two values (a and b)

OK, how many clients are updating the table concurrently? Is there a
single client or multiple ones?

> i'm using:
> - alter table work set fillfactor 50
> - btree index on value, fillfactor 50
> - btree index on kind1, fillfactor 50

I'd use significantly higher fillfactor - I'd probably start with 90 and
see if decreasing it improves the performance. My guess is it won't or
maybe it will even hurt performance. Fillfactor 50 means only 50% of the
space is used initially, so the table occupies almost 2x the space (so
more data needs to be read/written etc).

Prepare a short simulation of your workload and run it with various
fillfactor settings - that's the best way to see the effect.

> my question:
> 1. what can i do to perform this selects faster:
> SELECT id, value FROM work WHERE value>=$1 AND kind1=$2 AND kind2=$3 AND
> kind3=$4 FOR UPDATE;
> SELECT id, value FROM work WHERE userid=$1 AND kind1=$1 AND kind2=$3 AND
> kind3=$4 FOR UPDATE;

Well, that really depends on the data. What is the selectivity of the
conditions, i.e. how many rows match each part? You can either create an
index on each column separately or one index on multiple columns. Try this

INDEX ON (kind1), INDEX ON (kind2), INDEX ON (kind3)
INDEX ON (kind1, kind2, kind3)

How does the 'value' relate to the other columns? You could create an
index on this column too, but that would prevent HOT and thus the
fillfactor is pointless.

> 2. How about inheriting and partitioning? I'm thinking about creating two
> tables, one for kind1(a) and second for kind1(b), will it help in
> performance?

It could help, especially if constraint_exclusion is on.

> 3. Is btree best for index on enum?

The real problem here is selectivity - how many rows match the condition.
If too many rows match it, random access is ineffective.

Try it - the only other option is 'hash' indexes, and there are serious
disadvantages (just equality, no crash safety etc.).

Or you can try partial indexes:

  http://www.postgresql.org/docs/8.4/static/indexes-partial.html

i.e. instead of

CREATE INDEX ... ON table (kind1, kind2, kind3);

do something like

CREATE INDEX index_a ON table (kind2, kind3) WHERE (kind1 = 'a');
CREATE INDEX index_b ON table (kind2, kind3) WHERE (kind1 = 'b');

> 4. How about creating index on complex keys like
> (user_id,kind1,kind2,kind3)  and (price,kind1,kind2,kind3)?

Well, that's one of the options. But really, given the small amount of
information you've provided, this whole e-mail is rather a speculation
based on my imagination of what the statistical features of the data might
be.

The best solution is to try that - create the various indexes, run EXPLAIN
ANALYZE and post it to http://explain.depesz.com (so that we can see the
results).

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance