On Thu, 20 Sep 2012 11:55:02 -0600 Chris Friesen <chris.friesen@xxxxxxxxxxx> wrote: > On 09/20/2012 10:52 AM, Chris Friesen wrote: > > > > Hi, > > > > I've got a fairly beefy (32 cpus, 64GB ram, isci-based SAS disks, > > etc.) embedded system running 2.6.27. > > > > We're seeing issues where disk operations suddenly seem to stall. In > > the most recent case we had the hung-task watchdog indicate that > > md1_resync was stuck for more than 120sec in raise_barrier(). > > > > There are a bunch of "normal" tasks also stuck in wait_barrier(), so > > based on that I assume we're stuck in the second call to > > wait_event_lock_irq(). > > > > Has anyone seen anything like this? Could commit 73d5c38 be related? > > What about 1d9d524? > > Could d6b42dc be related? That last one seems more likely. Does the scenario fit your config. i.e. is your RAID1 being used under LVM? If it does, then I would say it is very likely this issue. > Also, what's the meaning of RESYNC_DEPTH? The maximum number of resync requests that can be concurrently active. RESYNC_WINDOW should really be RESYNC_BLOCK_SIZE * RESYNC_DEPTH I wonder why it isn't. NeilBrown
Attachment:
signature.asc
Description: PGP signature