On 02/24/2018 12:27 AM, ldh@xxxxxxxxxxxxxxxxxx wrote: > Hello > > > > I work with a large and wide table (about 300 million rows, about 50 > columns), and from time to time, we get business requirements to make > some modifications. But sometimes, it’s just some plain mistake. This > has happened to us a few weeks ago where someone made a mistake and we > had to update a single column of a large and wide table. Literally, the > source data screwed up a zip code and we had to patch on our end. > > > > Anyways… Query ran was: > > update T set source_id = substr(sourceId, 2, 10); > > Took about 10h and created 100’s of millions of dead tuples, causing > another couple of hours of vacuum. > > > > This was done during a maintenance window, and that table is read-only > except when we ETL data to it on a weekly basis, and so I was just > wondering why I should pay the “bloat” penalty for this type of > transaction. Is there a trick that could be use here? Yes, there is a trick I like to use here, as long as you don't mind locking the table (even against reads). I'll assume T.source_id is of type text. If it's not, use whatever the actual type is. ALTER TABLE T ALTER COLUMN source_id TYPE text USING substr(sourceId, 2, 10); I copied what you had verbatim, I earnestly hope you don't have two columns source_id and sourceId in your table. This will rewrite the entire table just the same as a VACUUM FULL after your UPDATE would. Don't forget to VACUUM ANALYZE this table after the operation. Even though there will be no dead rows, you still need to VACUUM it to generate the visibility map and you need to ANALYZE it for statistics on your "new" column. Foreign keys remain intact with this solution and you don't have double wal logging like for an UPDATE. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support