> -----Original Message----- > From: Vik Fearing [mailto:vik.fearing@xxxxxxxxxxxxxxx] > Sent: Friday, March 02, 2018 20:56 > To: ldh@xxxxxxxxxxxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxxxxxxxx > Cc: Stephen Frost <sfrost@xxxxxxxxxxx> > Subject: Re: Updating large tables without dead tuples > > On 02/24/2018 12:27 AM, ldh@xxxxxxxxxxxxxxxxxx wrote: > > Hello > > > > > > > > I work with a large and wide table (about 300 million rows, about 50 > > columns), and from time to time, we get business requirements to make > > some modifications. But sometimes, it's just some plain mistake. This > > has happened to us a few weeks ago where someone made a mistake and we > > had to update a single column of a large and wide table. Literally, > > the source data screwed up a zip code and we had to patch on our end. > > > > > > > > Anyways. Query ran was: > > > > update T set source_id = substr(sourceId, 2, 10); > > > > Took about 10h and created 100's of millions of dead tuples, causing > > another couple of hours of vacuum. > > > > > > > > This was done during a maintenance window, and that table is read-only > > except when we ETL data to it on a weekly basis, and so I was just > > wondering why I should pay the "bloat" penalty for this type of > > transaction. Is there a trick that could be use here? > Yes, there is a trick I like to use here, as long as you don't mind locking the > table (even against reads). > > I'll assume T.source_id is of type text. If it's not, use whatever the actual type > is. > > ALTER TABLE T > ALTER COLUMN source_id TYPE text USING substr(sourceId, 2, 10); > > I copied what you had verbatim, I earnestly hope you don't have two columns > source_id and sourceId in your table. > > This will rewrite the entire table just the same as a VACUUM FULL after your > UPDATE would. > > Don't forget to VACUUM ANALYZE this table after the operation. Even though > there will be no dead rows, you still need to VACUUM it to generate the > visibility map and you need to ANALYZE it for statistics on your "new" column. > > Foreign keys remain intact with this solution and you don't have double wal > logging like for an UPDATE. > -- > Vik Fearing +33 6 46 75 15 36 > http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support [Laurent Hasson] Yes, sorry... only a single column source_id. I understand your idea... Is that because a TEXT field (vs a varchar) would be considered TOAST and be treated differently? Thanks, Laurent.