Re: Millions of tables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/29/16 6:11 AM, Alex Ignatov (postgrespro) wrote:
With millions of tables you have to set    autovacuum_max_workers
 sky-high =). We have some situation when at thousands of tables
autovacuum can’t vacuum all tables that need it. Simply it vacuums some
of most modified table and never reach others. Only manual vacuum can
help with this situation. With wraparound issue it can be a nightmare

Specifically, autovac isn't going to start worrying about anti-wrap vacuums until tables start hitting autovacuum_freeze_max_age (or autovacuum_multixact_freeze_max_age). Any tables that hit that threshold go to the front of the line for being vacuumed. (But keep in mind that there is no universal line, just what each worker computes on it's own when it's started).

Where things will completely fall apart for you is if a lot of tables all have roughly the same relfrozenxid (or relminmxid), like they would immediately after a large load. In that scenario you'll suddenly have loads of work for autovac to do, all at the same time. That will make the database, DBAs and you Very Unhappy (tm).

Somehow, some way, you *must* do a vacuum of the entire database. Luckily the freeze map in 9.6 means you'd only have to do that one time (assuming the data really is static). In any older version, (auto)vacuum will need to eventually *read everything in every table* at least once every ~2B transactions.

The only impact the number of tables is going to have on this is granularity. If you have a small number of large tables, you'll have (auto)vacuum processes that will need to run *uninterrupted* for a long time to move the freeze threshold on each table. If you have tons of small tables, you'll need tons of separate (auto)vacuums, but each one will run for a shorter interval, and if one gets interrupted it won't be as big a deal.

There is one potentially significant difference between autovac and manual vacuums here; autovac treats toast tables as just another table, with their own stats and their own freeze needs. If you're generating a lot of toast records that might make a difference.

When it comes to vacuum, you might find https://www.pgcon.org/2015/schedule/events/829.en.html useful.

On a different topic... I didn't see anything in the thread about what you're storing, but with the row counts you're talking about I'm guessing it's something that's time-series. https://github.com/ElephantStack/ElephantStack is a project exploring the idea of using Postgres array types as a far more efficient way to store that kind of data; instead of an overhead of 24 bytes per row (plus indexes) arrays give you essentially zero overhead per row. There's no code yet, but a few of us have done testing on some real world data (see the google group referenced from the README).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux