Re: Batch update query performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/07/2014 03:06 PM, Albe Laurenz wrote:
Hans Drexler wrote:
Postgres needs close to 50
minutes to process the same query on the same data. Sometimes, Postgres
needs more than 2 hours.

The application performs an update query on every row
of the table. The exact SQL of this query is:

update t67cdi_nl_cmp_descr set is_grc_002='Y'

[...]

We tried removing all indexes. That reduces the runtime to ~3 minutes.
When we start to put indexes back, the run time of the query increases
again with each index added.

Do I read that right that the duration of the update is reduced from
50 or 120 minutes to 3 when you drop all the indexes?

If that's true, you might be able to drop and re-create the indexes as part of the same transaction, and come out ahead. DROP/CREATE INDEX is transactional in PostgreSQL, so you can do:

BEGIN;
DROP INDEX index1;
...
DROP INDEX index15;
UPDATE t67cdi_nl_cmp_descr SET is_grc_002='Y'
CREATE INDEX index1 ...;
...
CREATE INDEX index15 ...;
COMMIT;

This will take an AccessExclusiveLock on the table, though, so the table will be inaccessible to concurrent queries while it's running.

Actually, since you are effectively rewriting the table anyway, you could create a new table with same structure, insert all rows from the old table, with is_grc_002 set to 'Y', drop the old table, and rename the new table into its place.

Do all the rows really need to be updated? If some of the rows already have is_grc_002='Y', you can avoid rewriting those rows by adding a WHERE-clause: WHERE NOT is_grc_002='Y' OR is_grc_002 IS NULL.

You could also play tricks with partitioning. Don't store the is_grc_002 row in the table at all. Instead, create two tables, one for the rows that implicitly have is_grc_002='Y' and another for all the other rows. Then create a view on the union of the two tables, which adds the is_grc_002 column. Instead of doing a full-table update, you can just alter the view to display is_grc_002='Y' for both tables (and add a new table to hold new rows with is_grc_002<>'Y').

Hypothesis
we have tried many things to solve this problem ourselves, but to no
avail so far. Our hypothesis is that
the Postgres creates new records for all rows and then needs to update
all 15 indexes to make them point to the new rows. There does not seem
to be a way to avoid that.

Question:
- Is our hypothesis correct?
- Can the forum please advise us on possible ways to make the query
faster?

Your hypothesis may be correct.

Yeah, sounds about right. A full-table UPDATE like that is pretty much the worst-case scenario for PostgreSQL's MVCC system, unfortunately.

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux