The billion row table _is_ the monthly partition! We are in the process of moving this scale of data to Redshift anyway, but that's a topic for another mailing list. Just trying to nail down what I can in the interim. I appreciate all the insights in this thread.
On Mon, May 15, 2023 at 10:04 AM Ron <ronljohnsonjr@xxxxxxxxx> wrote:
(A billion row table really should be partitioned. That ship sailed long ago, of course...)
Sure you can "manually" adjust it. cron is your friend!!
But... how long does it take to vacuum a freshly VACUUM FULLed table? Because I'm wondering if it would be more efficient to proactively keep the problem at bay by, for example, doing a plain VACUUM every... 2, 3, 4, 6 (take your pick) hours, along with a plain VACUUM right after the monthly purge.
On 5/15/23 11:26, Wells Oliver wrote:
This brings to mind another question.. is there a clever solution for the situation where .001 vacuum factor makes sense for a table at the end of the month where it's reached its general capacity, e.g. maybe 1bn rows, but at the beginning of the month is too aggressive and would cause too frequent vacuuming given the significantly lower volume of records? Simple manual adjustment through a month, or something easier?
On Mon, May 15, 2023 at 8:21 AM Wolfgang Wilhelm <wolfgang20121964@xxxxxxxx> wrote:
what do you mean by "agressively reclaiming space"? Do you expect to see more usable space in the file system? If this assumption is true you are mistaken about the way autovacuum works. It deletes dead tuples _in_ the blocks and frees space only when at least a certain amount of blocks at the end of the file with tuple data are empty.
Am Montag, 15. Mai 2023 um 16:40:47 MESZ hat Jeff Janes <jeff.janes@xxxxxxxxx> Folgendes geschrieben:
On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells.oliver@xxxxxxxxx> wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily.
You delete and reinsert every row every day, or just some lesser number of rows each day? In one giant transaction daily, or in a series of short mostly non-overlapping transactions throughout the day?We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB.
Are these sizes for the entire relation size, or just for the core table (not TOAST, not indexes)? Index bloat is harder to prevent than table bloat, but is easier to deal with after the fact (reindex concurrently)These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.
50 GB out of 200 GB seems roughly in accord with the default setting of autovacuum_vacuum_scale_factor. So maybe just lowering that from 0.2 to, say, 0.05 would be good enough, either globally or just for these tables.
Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.
Since you rarely get new data once the month is over, you could lock the partition against concurrent changes (but still allow reads) once it is no longer "on the run" while you copy the data into a new tightly-packed table and index that and add constraints. Then you need only a very brief access exclusive lock while you detach/drop the old partition and attach the new table in its place. Of course you can't drop the weaker lock while you acquire the stronger one lest the old table change during the gap so lock management can be a bit tedious, but it is surely less restrictive than a VACUUM FULL.
Born in Arizona, moved to Babylonia.