Re: Millions of tables

Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> · Fri, 30 Sep 2016 17:49:02 -0500

On 9/29/16 6:11 AM, Alex Ignatov (postgrespro) wrote:
With millions of tables you have to set    autovacuum_max_workers
 sky-high =). We have some situation when at thousands of tables
autovacuum can’t vacuum all tables that need it. Simply it vacuums some
of most modified table and never reach others. Only manual vacuum can
help with this situation. With wraparound issue it can be a nightmare

Specifically, autovac isn't going to start worrying about anti-wrap 
vacuums until tables start hitting autovacuum_freeze_max_age (or 
autovacuum_multixact_freeze_max_age). Any tables that hit that threshold 
go to the front of the line for being vacuumed. (But keep in mind that 
there is no universal line, just what each worker computes on it's own 
when it's started).

Where things will completely fall apart for you is if a lot of tables 
all have roughly the same relfrozenxid (or relminmxid), like they would 
immediately after a large load. In that scenario you'll suddenly have 
loads of work for autovac to do, all at the same time. That will make 
the database, DBAs and you Very Unhappy (tm).

Somehow, some way, you *must* do a vacuum of the entire database. 
Luckily the freeze map in 9.6 means you'd only have to do that one time 
(assuming the data really is static). In any older version, (auto)vacuum 
will need to eventually *read everything in every table* at least once 
every ~2B transactions.

The only impact the number of tables is going to have on this is 
granularity. If you have a small number of large tables, you'll have 
(auto)vacuum processes that will need to run *uninterrupted* for a long 
time to move the freeze threshold on each table. If you have tons of 
small tables, you'll need tons of separate (auto)vacuums, but each one 
will run for a shorter interval, and if one gets interrupted it won't be 
as big a deal.

There is one potentially significant difference between autovac and 
manual vacuums here; autovac treats toast tables as just another table, 
with their own stats and their own freeze needs. If you're generating a 
lot of toast records that might make a difference.

When it comes to vacuum, you might find 
https://www.pgcon.org/2015/schedule/events/829.en.html useful.

On a different topic... I didn't see anything in the thread about what 
you're storing, but with the row counts you're talking about I'm 
guessing it's something that's time-series. 
https://github.com/ElephantStack/ElephantStack is a project exploring 
the idea of using Postgres array types as a far more efficient way to 
store that kind of data; instead of an overhead of 24 bytes per row 
(plus indexes) arrays give you essentially zero overhead per row. 
There's no code yet, but a few of us have done testing on some real 
world data (see the google group referenced from the README).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance