At approximately 01:00 UTC today, clients requesting the mirror list started getting timeouts, then HTTP 503 errors generated by the MirrorManager mirror list processes. On the Fedora Infrastructure application servers, the loads spiked, the out-of-memory killer started firing, and chaos ensued. Proximate cause of this failure appears to be due to invalid data in the MirrorManager database - specifically, the bandwidth value for several servers was NULL, when that should not be possible. I say proximate, not root, as I have not been able to validate the incorrect behavior with incorrect data, though after fixing the invalid data, we have not seen further failures. That remains to be done. There are fixes in the MirrorManager 1.4 (unreleased) branch to prevent invalid data from happening, but these were not present in the 1.3 version currently in production. Additional fixes have been put into the MM 1.4 branch tonight to further ensure this type of invalid data cannot affect the mirrorlist_server process. Thanks to Stephen Smoogen and Kevin Fenzi for their quick work to identify the failing systems and minimize the impact to other Fedora services. Thanks, Matt _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure