Tim Cross <theophilusx@xxxxxxxxx> writes: > Jason Ralph <jralph@xxxxxxxxxxxxxxxxxxxxx> writes: >> I am planning an update on a table with 20Million records, I have been researching the best practices. I will remove all indexes and foreign keys prior to the update, however I am not sure if I should use a transaction or not. >> My thought process is that a transaction would be easier to recover if something fails, however it would take more time to write to the WAL log in a transaction. >> >> Would it make sense to make a back up of the table then execute update without a transaction? How would you guys do it? > Just because you have 20M records, it doesn't mean that you have to do > anything 'special'. Indeed. That's really not a lot of data, on any modern machine. Just for experiment, I tried updating a table of 20M rows on my development machine (no powerhouse by any serious standard, plus it's running a debug build which is likely 30% or more slower than production): regression=# \timing Timing is on. regression=# create table foo as select x, 'what' as w from generate_series(1,20000000) x; SELECT 20000000 Time: 23573.818 ms (00:23.574) regression=# update foo set w = 'other'; UPDATE 20000000 Time: 60878.418 ms (01:00.878) OK, that's cheating, because I had no indexes, so ... regression=# create index on foo(w); CREATE INDEX Time: 11774.094 ms (00:11.774) regression=# update foo set w = x::text; UPDATE 20000000 Time: 186425.606 ms (03:06.426) That may still be cheating, because I had no foreign-key constraints or other possibly-expensive features. And you might have lots of indexes not just one. Still, this doesn't seem to be territory where you need to spend days researching the best alternative. I'm not even convinced that it's worth your time to remove and rebuild indexes. But the real reason I'm commenting is that the original question seems to embed a whole lot of misunderstanding about how Postgres works. There is no such thing as updating "outside a transaction". You can use an explicit transaction (ie BEGIN + COMMIT) if you'd like to preserve the option to roll back after the UPDATE command proper, but that will make absolutely no difference to performance of the UPDATE. Likewise, there is no such thing as not writing WAL (unless you're using an UNLOGGED table, which you should only ever do for throwaway data). And also, there's unlikely to be any performance benefit from carving up the update into smaller pieces. If you're seriously short of disk space you might need to split up the update (because just doing it in one go will about double the table's size-on-disk) and vacuum between the stages to reclaim disk space. But that will take more time not less ... and again, on any modern machine, such steps shouldn't be necessary for only 20M rows. regards, tom lane