Re: Options for more aggressive space reclamation in vacuuming?

Wells Oliver <wells.oliver@xxxxxxxxx> · Mon, 15 May 2023 10:05:36 -0700

The billion row table _is_ the monthly partition! We are in the process of moving this scale of data to Redshift anyway, but that's a topic for another mailing list. Just trying to nail down what I can in the interim. I appreciate all the insights in this thread.

On Mon, May 15, 2023 at 10:04 AM Ron <ronljohnsonjr@xxxxxxxxx> wrote:

    (A billion row table really should be partitioned.  That ship sailed
    long ago, of course...)

    Sure you can "manually" adjust it.  cron is your friend!!

    But... how long does it take to vacuum a freshly VACUUM FULLed
    table?  Because I'm wondering if it would be more efficient to
    proactively keep the problem at bay by, for example, doing a plain
    VACUUM every... 2, 3, 4, 6 (take your pick) hours, along with a
    plain VACUUM right after the monthly purge.

    On 5/15/23 11:26, Wells Oliver wrote:

      This brings to mind another question.. is there a
        clever solution for the situation where .001 vacuum factor makes
        sense for a table at the end of the month where it's reached its
        general capacity, e.g. maybe 1bn rows, but at the beginning of
        the month is too aggressive and would cause too frequent
        vacuuming given the significantly lower volume of records?
        Simple manual adjustment through a month, or something easier? 

        On Mon, May 15, 2023 at
          8:21 AM Wolfgang Wilhelm <wolfgang20121964@xxxxxxxx>
          wrote:

              Hi,

              what do you mean by "agressively reclaiming space"?
                Do you expect to see more usable space in the file
                system? If this assumption is true you are mistaken
                about the way autovacuum works. It deletes dead tuples
                _in_ the blocks and frees space only when at least a
                certain amount of blocks at the end of the file with
                tuple data are empty.

              Regards
              WW

                   Am Montag, 15. Mai 2023 um 16:40:47 MESZ hat
                    Jeff Janes <jeff.janes@xxxxxxxxx>
                    Folgendes geschrieben: 

                          On Sun, May 14, 2023 at 5:03 PM
                            Wells Oliver <wells.oliver@xxxxxxxxx>
                            wrote:

                              I think our use-case here
                                is nothing special: just very large
                                tables that are deleted from and
                                re-inserted with some frequency. They
                                store monthly data and exist as
                                partitions. As an example, July of 2022
                                (summer months are peak)
                                had 1,630,558,336 rows. We delete from
                                this and insert daily. 

                            You delete and reinsert every row every
                              day, or just some lesser number of rows
                              each day?  In one giant transaction daily,
                              or in a series of short mostly
                              non-overlapping transactions throughout
                              the day?

                              We rarely get new, relevant
                                data once the month is over. The
                                n_dead_tup from pg_stat_all_tables here
                                was over 7m rows, and clearing that out
                                gave us back nearly 50GB, and the file
                                size estimate on this partition was ~200
                                GB. 

                            Are these sizes for the entire
                              relation size, or just for the core table
                              (not TOAST, not indexes)?  Index bloat is
                              harder to prevent than table bloat, but is
                              easier to deal with after the fact
                              (reindex concurrently)

                              These tables get
                                auto-vacuumed but clearly it's not,
                                well, aggressively reclaiming space.

                            50 GB out of 200 GB seems roughly in
                              accord with the default setting
                              of autovacuum_vacuum_scale_factor.  So
                              maybe just lowering that from 0.2 to, say,
                              0.05 would  be good enough, either
                              globally or just for these tables.

                                Given that they are partitions,
                                  VACUUM FULL locks the parent table, so
                                  new writes/deletes can't happen, and
                                  anyway people like to run queries
                                  against these tables quite frequently.

                            Since you rarely get new data once the
                              month is over, you could lock the
                              partition against concurrent changes (but
                              still allow reads) once it is no
                              longer "on the run" while you copy the
                              data into a new tightly-packed table and
                              index that and add constraints.  Then you
                              need only a very brief access exclusive
                              lock while you detach/drop the old
                              partition and attach the new table in its
                              place.  Of course you can't drop the
                              weaker lock while you acquire the stronger
                              one lest the old table change during the
                              gap so lock management can be a bit
                              tedious, but it is surely less restrictive
                              than a VACUUM FULL.

                              Cheers,

                              Jeff

      -- 

          Wells Oliver

            wells.oliver@xxxxxxxxx

    -- 

      Born in Arizona, moved to Babylonia.

-- 
Wells Oliver
wells.oliver@xxxxxxxxx