Strange week last week, many of you noticed a bunch of nagios outages so I thought I'd send a roundup of what happened. 1) The big one was what seems to be a corrupt database table. For some reason running a vacuum on a table (which was only 66M large) was taking a long time and even after it would finish the disks would thrash for sometimes 10 minutes after. This caused outages of lots of our systems like the account system, to which other systems depend. The job was hourly so thats why it kept happening. We were able to reproduce this on another host and never quite figured out what was going on but a dump, drop, restore fixed the issue and so far we haven't had time to revisit what was going on, just that it hasn't happened since. 2) Strange network issues towards the end of the week. Seems our round time to server beach went up causing nagios to flag some hosts as dead. I've also not yet had time to look into this. The network seems and I don't think we're seeing any functional issues from it but it was different. 3) pkgdb's home page started taking longer to load causing our balancer to start flagging it dead causing it to throw 503's. We only recently moved it to haproxy so this could be a normal behavior that we just didn't see. I've moved response time of the front page up to 5 seconds from 2. -Mike _______________________________________________ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list