On Wed, May 18, 2011 at 13:51, Adam M. Dutko <dutko.adam@xxxxxxxxx> wrote: >> I think this is a good test to see what is the problem. The deadlocks >> and OOM's seem to happen at 0400 when other virtual systems are > > Hrm... so all of these are xen instances and they're doing backups at > the same time. If the rsync processes are going into a D state I'd > think it's an I/O exhaustion problem. Would it be possible to alter > the backup schedule and stagger them if the scheduler change doesn't > work? I believe the backup jobs are staggered. The issue is more with the 0400 do my daily jobs that happen in cron.daily fires off. Now this does not seem to be the cause all the time, but for at least a couple it has occured. I am guessing that IO exhaustion is going on and the OOM is because the kernel could not talk to swap for over 120 s and went in a killing frenzy. > -Adam > _______________________________________________ > infrastructure mailing list > infrastructure@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/infrastructure > -- Stephen J Smoogen. "The core skill of innovators is error recovery, not failure avoidance." Randy Nelson, President of Pixar University. "Let us be kind, one to another, for most of us are fighting a hard battle." -- Ian MacLaren _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure