RE: Updating large tables without dead tuples

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Vik Fearing [mailto:vik.fearing@xxxxxxxxxxxxxxx]
> Sent: Friday, March 02, 2018 20:56
> To: ldh@xxxxxxxxxxxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxxxxxxxx
> Cc: Stephen Frost <sfrost@xxxxxxxxxxx>
> Subject: Re: Updating large tables without dead tuples
> 
> On 02/24/2018 12:27 AM, ldh@xxxxxxxxxxxxxxxxxx wrote:
> > Hello
> >
> >
> >
> > I work with a large and wide table (about 300 million rows, about 50
> > columns), and from time to time, we get business requirements to make
> > some modifications. But sometimes, it's just some plain mistake. This
> > has happened to us a few weeks ago where someone made a mistake and we
> > had to update a single column of a large and wide table. Literally,
> > the source data screwed up a zip code and we had to patch on our end.
> >
> >
> >
> > Anyways. Query ran was:
> >
> >     update T set source_id = substr(sourceId, 2, 10);
> >
> > Took about 10h and created 100's of millions of dead tuples, causing
> > another couple of hours of vacuum.
> >
> >
> >
> > This was done during a maintenance window, and that table is read-only
> > except when we ETL data to it on a weekly basis, and so I was just
> > wondering why I should pay the "bloat" penalty for this type of
> > transaction. Is there a trick that could be use here?
> Yes, there is a trick I like to use here, as long as you don't mind locking the
> table (even against reads).
> 
> I'll assume T.source_id is of type text.  If it's not, use whatever the actual type
> is.
> 
> ALTER TABLE T
>     ALTER COLUMN source_id TYPE text USING substr(sourceId, 2, 10);
> 
> I copied what you had verbatim, I earnestly hope you don't have two columns
> source_id and sourceId in your table.
> 
> This will rewrite the entire table just the same as a VACUUM FULL after your
> UPDATE would.
> 
> Don't forget to VACUUM ANALYZE this table after the operation.  Even though
> there will be no dead rows, you still need to VACUUM it to generate the
> visibility map and you need to ANALYZE it for statistics on your "new" column.
> 
> Foreign keys remain intact with this solution and you don't have double wal
> logging like for an UPDATE.
> --
> Vik Fearing                                          +33 6 46 75 15 36
> http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

[Laurent Hasson] 
Yes, sorry... only a single column source_id. I understand your idea... Is that because a TEXT field (vs a varchar) would be considered TOAST and be treated differently?

Thanks,
Laurent.





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux