I thought I would send out a note on what we know so far, and status on koji db. Last night the vm that runs koji's database went unresponsive. Rebooting it made it come up in a very degraded state where it couldn't find some of it's disks and never fully started. An additional reboot brought it back up, but as soon as it started serving database requests it went into a state where it was waiting for i/o and not processing. We rebooted the virthost that runs that vm a number of times as well as engaged networking and storage folks to look at those things. In working on this issue we: * ran a database vacuum on the koji db. * fixed a misconfiguration in our multipathd config. * fixed a configuration issue on a related virthost that made it's vm's unable to connect to the storage network. Finally things seemed to settle down early this morning and we were able to bring the database back online. Later in the morning there was another short period of heavy i/o wait, but it recovered without intervention on our part. The root cause seems to be the iscsi netapp volume that the instance was defined on had some connectivity or loading issues and wasn't able to handle the load for the vm. We have storage folks looking for issues on the netapp side of things and we are closely watching the server end. Hopefully we are all back on track now. kevin
Attachment:
pgp7fmIwMMrpI.pgp
Description: OpenPGP digital signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx http://lists.fedoraproject.org/admin/lists/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx