We're using Postgresql 9.1.9 on Ubuntu 12.04 on EBS volumes on m1.xlarge instances, which have: 15 GiB memory 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each) 64-bit platform (Yes, we're moving to EBS Optimized instances + Provisioned IOPS volumes, but prelim. benchmarks suggest this won't get us enough of a boost as much as possibly refactoring the way we're executing these bulk updates in our application.) On Fri, Apr 26, 2013 at 5:27 PM, Gavin Flower <GavinFlower@xxxxxxxxxxxxxxxxx> wrote: > On 27/04/13 12:14, Yang Zhang wrote: > > It currently takes up to 24h for us to run a large set of UPDATE > statements on a database, which are of the form: > > UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE > id = constid > > (We're just overwriting fields of objects identified by ID.) > > The tables have handfuls of indices each and no foreign key constraints. > No COMMIT is made till the end. > > It takes 2h to import a `pg_dump` of the entire DB. This seems like a > baseline we should reasonably target. > > Short of producing a custom program that somehow reconstructs a dataset > for Postgresql to re-import, is there anything we can do to bring the > bulk UPDATE performance closer to that of the import? (This is an area > that we believe log-structured merge trees handle well, but we're > wondering if there's anything we can do within Postgresql.) > > Some ideas: > > - dropping all non-ID indices and rebuilding afterward? > - increasing checkpoint_segments, but does this actually help sustained > long-term throughput? > - using the techniques mentioned here? (Load new data as table, then > "merge in" old data where ID is not found in new data) > > <http://www.postgresql.org/message-id/3a0028490809301807j59498370m1442d8f5867e9668@xxxxxxxxxxxxxx> > > Basically there's a bunch of things to try and we're not sure what the > most effective are or if we're overlooking other things. We'll be > spending the next few days experimenting, but we thought we'd ask here > as well. > > Thanks. > > > People will need to know your version of Postgres & which Operating System > etc. plus details of CPU RAM, and Disks... AS well as what changes you have > made to postgresql.conf... > > I would be inclined to DROP all indexes and reCREATE them later. > > Updating a row might lead to new row being added in a new disk page, so I > suspect that updates will hit every index associated with the table with the > (possible exception of partial indexes). > > Running too many updates in one transaction, may mean that Postgres may need > to use disk work files. > > Depending on RAM etc, it may pay to increase some variables tat affect how > Postgres uses RAM, some of these are per session. > > > Cheers, > Gavin -- Yang Zhang http://yz.mit.edu/ -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general