Thank you very much for this complete explanation and opinion, Keith.
The first goal of publishing this mail on the list was to answer how to measure precisely free space: pgstattuple is the correct one.
The second goal is running so far: manual vacuum to accomplish reuse of that large free area. (i will post a pgstattuple of the table when this vacuum ends)
And the third goal is to free and downsize this huge allocation of space: schedule an outage to run vacuum full.
Again i am truly grateful for your precious help.
And i will post here Pg_total_relation_size before and after running vacuum full when we schedule that outage.
Best Regards,
Sidney Pryor
De: Keith Fiske <keith.fiske@xxxxxxxxxxxxxxx>
Enviado: terça-feira, 28 de julho de 2020 10:55 Para: Sidney Aloisio Ferreira Pryor <sidney@xxxxxxxxxxx> Cc: Ron <ronljohnsonjr@xxxxxxxxx>; pgsql-admin@xxxxxxxxxxxxxxxxxxxx <pgsql-admin@xxxxxxxxxxxxxxxxxxxx> Assunto: Re: How can i be certain autovacuum is causing reuse if table still grows Again, I would caution against going with partitioning until you can first try and tune the table itself. I have quite a bit of experience with it as I'm the author of pg_partman so you can see I have nothing against partitioning itself, but I'm only for
using it when it's necessary. If you decide to go that route, it would be a good tool to help you automate much of the maintenance process of partitioning either by time or integer.
I say this because, while it could possibly help with making vacuum more efficient, there are many caveats to partitioning. Since you're on 9.6, there is no native partitioning, so you'd have to do trigger based partitioning which has a HUGE impact on
write performance (at least 2x slower if not worse). Even with native partitioning in 10+, there is a performance penalty during tuple routing and you would want to test that impact. Especially if you are doing updates which would move data across child partitions
(updates that move data prior to PG10 are very, very complicated and not supported by pg_partman). Also, if you have any unique keys to maintain, they are not enforced across the partition set, and only supported in native if they are also part of the partition
key itself. You're also just adding to the general complexity of your database overall.
I would in general say the primary reason you would want to consider partitioning in PG is if you're trying to expire old data out. Dropping a table is much more efficient than running large/many DELETE statements due to the very issues you're encountering
now: bloat. But it sounds like you are not expiring data, so you also have to take into account how many partitions you will have long term. It wasn't until PG12 that having over just even a few hundred partitions could have major performance impact overall.
So, before even considering partitioning, you would want to get onto the latest major release.
Partitioning can help with vacuum and general maintenance, but that's really only when the actual live data in a single table starts getting very, very large. And in that case you may want to first consider your hardware resources before going down the
road of partitioning since that may be the better benefit long term. You've got nearly 2.3TB of space to clean up in this table, so, again, you want to solve that problem first. If you had that much free space before this vacuum finished, you likely have even
more now after it finishes. And just note that a regular vacuum is likely not going to clean up this space. It may clean up some, but vacuum only returns space to the system in very specific circumstances (the tail-end pages are empty). You could use something
like pg_repack or pg_squeeze to do this without a long outage, but with bloat this size, you may run into issues with how long those tools would need to run. Especially if the table is still in use. So, again, I would recommend an outage to do a VACUUM FULL
which will lock and rewrite the table. I'd be curious to see what your total size of the entire table before and after is. The following function will give that total size (table + indexes)
select pg_total_relation_size('public.mytable');
So, while partitioning could possibly be the answer long term, I would not recommend it until you've cleaned up the existing bloat on this table and its indexes and try to tune autovacuum run more efficiently. If that works, you've saved yourself a great
deal of complexity. And again, you'd definitely want to get on the latest major version of PG where partitioning has improved tremendously, so if a major version upgrade isn't in the works for a while, I wouldn't even consider it.
--
On Tue, Jul 28, 2020 at 6:18 AM Sidney Aloisio Ferreira Pryor <sidney@xxxxxxxxxxx> wrote:
|