Re: transaction wrap around

Thomas Munro <thomas.munro@xxxxxxxxxxxxxxxx> · Wed, 6 Dec 2017 14:50:40 +1300

On Tue, Dec 5, 2017 at 5:43 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
> On Mon, Dec 4, 2017 at 5:52 PM, John R Pierce <pierce@xxxxxxxxxxxx> wrote:
>> On 12/4/2017 2:21 PM, chris kim wrote:
>>> How would I investigate if my database is nearing a transaction wrap
>>> around.
>>
>> it would be screaming bloody murder in the log, for one.
>>
>
> Unfortunately, that comes far too late to repair the problem without a
> substantial service interruption, on very high transaction throughput
> installations.
>
> Also, people usually consult the logs to figure out what the problem is,
> once they become aware that one exists.  That is also too late.

The problem is that our logic (1) focuses on when we should *start*
freezing, not by when we'd like to be finished, and (2) is defined in
such a way that many tables are likely to reach the trigger point at
the same time.  Even if your system can handle the load, you might not
like the disruption to regular vacuuming and analyze work.

An ideal system would estimate how long it's going to take and how
long we've got (current tx consumption rate, xids remaining) before
autovacuum_freeze_max_age  is reached and then spread the work out so
that we get it done just in time with minimal impact.  Getting
reliable estimates to control that seems hard though.

Perhaps we could add a much simpler first defence that tries to
prevent autovacuum_freeze_max_age (and its multixact cousin) from
being reached like this: consider launching at most one wraparound
vacuum for any relation that is *half way* to
autovacuum_freeze_max_age.  That gives the system a chance to handle
each partition of a monster partitioned table calmly in series even if
they have the same age dating back to schema creation/data load time.
Perhaps it could consider adding more vacuum backends as you approach
autovacuum_freeze_max_age, or something.  Hopefully you'd never
actually reach it.

Of course you can do what I just said with a cron job, and there may
be better heuristics than that, but it'd be nice to find *some* way to
make freeze max age more gradual by default on large databases, until
such time as we can kill it with 64 bit xids or other major efforts.
My understanding is that even with the new freeze map, most big
systems will still pay the full price for the first wraparound vacuum
freeze, so I still expect to encounter 20TB production databases in
the wild that have gone into a wraparound frenzy confounding their
owners.

-- 
Thomas Munro
http://www.enterprisedb.com