Search Postgresql Archives

Re: Optimizing bulk update performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yang Zhang <yanghatespam@xxxxxxxxx> writes:
> It currently takes up to 24h for us to run a large set of UPDATE
> statements on a database, which are of the form:

>     UPDATE table SET field1 = constant1, field2 = constant2, ...  WHERE
>     id = constid

> (We're just overwriting fields of objects identified by ID.)

Forgive the obvious question, but you do have an index on "id", right?
Have you checked it's being used (ie EXPLAIN ANALYZE on one of these)?

> The tables have handfuls of indices each and no foreign key constraints.

How much is a "handful"?

> It takes 2h to import a `pg_dump` of the entire DB.  This seems like a
> baseline we should reasonably target.

Well, maybe.  You didn't say what percentage of the DB you're updating.

But the thing that comes to mind here is that you're probably incurring
a network round trip for each row, and maybe a query-planning round as
well, so you really can't expect that this is going to be anywhere near
as efficient as a bulk load operation.  You could presumably get rid of
the planner overhead by using a prepared statement.  Cutting the network
overhead is going to require a bit more ingenuity --- could you move
some logic into a stored procedure, perhaps, so that one command from
the client is sufficient to update multiple rows?

			regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux