I was going to wait until after freeze for this, but with us slipping a week I think it might be worth doing now. For the last few weeks we have been having issues with db-koji01. The problem started when I moved it's backend storage from one iscsi/pv to another iscsi/pv. The load has been high since then and it's not as performant as it was. Effects: * koji alerts in nagios make us need to restart httpd on koji01 (which we can do without outage, but means a human has to wake up and go do it). * If koji01 httpd isn't restarted, kojira sometimes will timeout and not launch newrepos. (We worked around this by increasing the timeout, but it's only a matter of time before it hits this again). * Pages on koji that need lots of db access are slower than they were/need to be. Cause: Not entirely sure what the base cause is. lvdisplay shows the guest is on the right iscsi volume, there's no iscsi errors or the like. The host did have stale lvm data due to lvmetad running, but that shouldn't have affected the running guest(s). I can only think there's something still trying to hit the old no longer used iscsi volume and causing extra load. What I would like to do: * Stop postgres on db-koji01. This will cause the hub to show db down to anyone looking. * rsync /var/lib/pgsql off to backup03. This should take less than 10min. * shutdown db-koji01 and dhcp01. * Reboot bvirthost09 * See if the issue clears up. If something happens and db-koji01 doesn't come back up right, we can make a new one and sync /var/lib/pgsql back to it and be back up pretty quickly. Hopefully it won't come to that. I'd like to schedule this possibly over the weekend off hours when koji isn't all that busy. Thoughts? +1s? kevin
Attachment:
pgpOwUV6B7TjK.pgp
Description: OpenPGP digital signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure