On Sun, 2020-03-01 at 11:26 -0800, Kevin Fenzi wrote: > Greetings. > > Last night koji alerted due to slowness. It was not backups or anything, > but rather the database hitting the limit I raised in commit c678f73b: > > -autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum > +autovacuum_freeze_max_age = 300000000 # maximum XID age before forced vacuum > > What this means is basicially: postgres records the xid (transaction id) > that can 'see' other transactions in the table rows. However, xid is a > 32 bit value, meaning there can only be about 2.1billion transactions > before it 'wraps around'. When it does so, all the 'old' XID's need to > be gone or it will confuse it. It removes the old xids by marking old > transactions as 'frozen' (so any other transaction should see them). > > So, this value tells the autovacuumer to start processing the table for > old xids and frezzing them, so by the time the wrap around happens > everything will be set. > > Unfortunately, it's doing this on the buildroot_listing table, which is: > public | buildroot_listing | table | koji | 219 GB | > > So, the i/o load is heavy and koji is slow to respond to real requests. > > There's (at least) tree things we could do: > > 1. Bump the autovacuum_freeze_max_age up to 600million. The 100million > bump I did in january gave us about 1.5 months, so if we do 600, we > might last until june, when we will be migrating to the new datacenter. > 600million is still a long way from 2.1 billion, so it should be fine. > At that point I hope to move db-koji01 to a rhel8 instance and much > newer postgresql. We could also run the vacuum duing downtime and let it > finish. > > 2. Just let it finish now. Things will be slow, I don't know for how > long. Users will complain and it will take longer for people to get > things done, but at the end we should be in better shape and there's > basically no action we need to take (other than handling complaints) > > 3. Schedule an outage and take the db offline and run the vacuum. This > might be quicker than letting the autovac finish, I am not sure. > > Thoughts? please +1 the freeze break of the option you thing is best, or > feel free to ask more info or suggest other options. I'm fine with 1 or 2, but think we should definitely *not* do 3 unless a) we know quite precisely how long it will take and b) that is substantially faster than the online autovac. -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net _______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx