Re: why pg_class.relfrozenxid needs to be updated for frozen tables (where all rows have xmin=2)?

Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> · Mon, 2 Feb 2015 19:36:15 -0600

On 2/2/15 7:01 PM, Slava Mudry wrote:

Please don't top-post. It's much better to answer questions inline in an 
email.

I am running PostgreSQL 9.3.2 on linux. Freeze values are defaults.
We cannot rely on autovacuum on our current hardware, so it's turned
down to 2 workers with naptime=10min. But we are running weekly vacuum
on whole db and daily vacuum freeze on new partitions.

While you may not be able to rely on autovac for all your needs, 
changing the naptime is unlikely to help much, unless you have an 
extremely large number of tables (like, 100k or more).

I agree that system is designed pretty bad in a way that it creates high
transaction volume instead of batching updates/inserts. However I still
feel that postgres should be able to do something more optimal in such
cases.

Currently the fact that it needs to go back to old tables and FTS them
every 2B transactions (or rely on autovacuum for this) and you can't do
anything about it (like permanently freeze the tables) seems like a big
scalability issue. Does it not?

Unfortunately it's not terribly easy to fix this. The problem is if we 
try to play games here, we must have a 100% reliable method for changing 
relfrozenxid as soon as someone inserts a new tuple in the relation. It 
might be possible to tie this into the visibility map, but no one has 
looked at this yet.

Perhaps you'd be willing to investigate this, or sponsor the work?

Thank you.

On Fri, Jan 30, 2015 at 5:28 PM, Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx
<mailto:Jim.Nasby@xxxxxxxxxxxxxx>> wrote:

    On 1/30/15 5:44 PM, Slava Mudry wrote:

        Hi,
        I have to deal with badly written system which regularly suffers
        from
        transaction wraparound issue. This issue is happenning every
        10-14 days
        and forces me to take system offline and vacuum in single-user mode.
        Main causes for this are (afaik):
        1) heavy transaction traffic + 100+GB of stale tables
        2) slow i/o (rotational drives)
        3) autovacuum can't keep up.

        Basically the database stores events data in daily partitioned table
        "daily_events".
        What I did, was - I ran vaccum freeze on all partitions (the
        tables are
        never touched after they're done for a day). I have also scheduled
        vacuum-freeze for a partition after it's done writing.

        This essentially set xmin in each partition to "frozen" value of
        "2".
        However, to my surprise, this was not enough!
        Postgres stores relfrozenxid in pg_class and this value
        apparently is
        getting old pretty fast (due to high volume of transactions).
        And it seems that it doesn't really matter that xmin is frozen for a
        table, the relfrozenxid is what causing transaction wraparound.

    relfrozenxid is only part of the picture. A database-wide freeze
    vacuum will be controlled by pg_database.datfrozenxid.

    What version is this? You may also be suffering from multixact wrap.

        Why is that? and most importantly - why updating
        pg_class.relfrozenxid
        requires huge amount of i/o by vacuum process for tables that
        are never
        updated?

    Because it has to scan the entire table to see what the oldest XID
    is. We don't check to see if relfrozenxid is already 2, though I
    suppose we could add that.

        Is it safe to just update pg_class.relfrozenxid for tables where
        xmin=2
        for all rows? Same for linked toast table?

    That would be a great way to lose data...

    You need to look at relations where relfrozenxid is >= 3 and see why
    relfrozenxid isn't advancing fast enough on them. Check your cost
    delay settings as well as the *freeze* settings. It's very likely
    that on a system this busy autovac would never keep up with default
    settings.

    Also, keep in mind that transaction and multixact IDs are
    cluster-wide, so this is going to affect all databases in that
    instance. You should think about ways to move the heaviest
    transaction workload to a separate cluster; possibly putting the raw
    updates there and having a separate process that aggregates that
    data into fewer transactions for the main cluster.
    --
    Jim Nasby, Data Architect, Blue Treble Consulting
    Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance