Freeze break request: koji autovacuum_freeze_max_age

Kevin Fenzi <kevin@xxxxxxxxx> · Sun, 1 Mar 2020 11:26:00 -0800

Greetings. 

Last night koji alerted due to slowness. It was not backups or anything,
but rather the database hitting the limit I raised in commit c678f73b:

-autovacuum_freeze_max_age = 200000000   # maximum XID age before forced vacuum 
+autovacuum_freeze_max_age = 300000000   # maximum XID age before forced vacuum 

What this means is basicially: postgres records the xid (transaction id)
that can 'see' other transactions in the table rows. However, xid is a
32 bit value, meaning there can only be about 2.1billion transactions
before it 'wraps around'. When it does so, all the 'old' XID's need to
be gone or it will confuse it. It removes the old xids by marking old
transactions as 'frozen' (so any other transaction should see them). 

So, this value tells the autovacuumer to start processing the table for
old xids and frezzing them, so by the time the wrap around happens
everything will be set. 

Unfortunately, it's doing this on the buildroot_listing table, which is:
 public | buildroot_listing         | table | koji  | 219 GB     | 

So, the i/o load is heavy and koji is slow to respond to real requests. 

There's (at least) tree things we could do: 

1. Bump the autovacuum_freeze_max_age up to 600million. The 100million
bump I did in january gave us about 1.5 months, so if we do 600, we
might last until june, when we will be migrating to the new datacenter.
600million is still a long way from 2.1 billion, so it should be fine. 
At that point I hope to move db-koji01 to a rhel8 instance and much
newer postgresql. We could also run the vacuum duing downtime and let it
finish. 

2. Just let it finish now. Things will be slow, I don't know for how
long. Users will complain and it will take longer for people to get
things done, but at the end we should be in better shape and there's
basically no action we need to take (other than handling complaints)

3. Schedule an outage and take the db offline and run the vacuum. This
might be quicker than letting the autovac finish, I am not sure. 

Thoughts? please +1 the freeze break of the option you thing is best, or
feel free to ask more info or suggest other options. 

kevin
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx