Re: hung in raise_barrier() in raid1.c -- any ideas?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/20/2012 03:27 PM, NeilBrown wrote:
On Thu, 20 Sep 2012 11:55:02 -0600 Chris Friesen<chris.friesen@xxxxxxxxxxx>
wrote:

On 09/20/2012 10:52 AM, Chris Friesen wrote:

Hi,

I've got a fairly beefy (32 cpus, 64GB ram, isci-based SAS disks,
etc.) embedded system running 2.6.27.

We're seeing issues where disk operations suddenly seem to stall.  In
the most recent case we had the hung-task watchdog indicate that
md1_resync was stuck for more than 120sec in raise_barrier().

There are a bunch of "normal" tasks also stuck in wait_barrier(), so
based on that I assume we're stuck in the second call to
wait_event_lock_irq().

Has anyone seen anything like this?  Could commit 73d5c38 be related?
What about 1d9d524?

Could d6b42dc be related?

That last one seems more likely.  Does the scenario fit your config.
i.e. is your RAID1 being used under LVM?

If it does, then I would say it is very likely this issue.


Yes, we're using it under LVM. I've added some instrumentation to tell if we're hitting that case. The current->bio_list handling is a bit different in 2.6.27 but I think I figured out the equivalent to the patch.

Interesting that it took this long to fix that issue.


Also, what's the meaning of RESYNC_DEPTH?

The maximum number of resync requests that can be concurrently active.

And each request would resync a single block?

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux