Re: Process stuck in md_flush_request (state: D)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






> On Feb 27, 2017, at 1:28 PM, Shaohua Li <shli@xxxxxxxxxx> wrote:
> 
> On Mon, Feb 27, 2017 at 09:49:59AM -0500, Les Stroud wrote:
>> After a period of a couple of weeks with one of our test instances having this problem every other day, they were all nice enough to operate without an issue for 9 days.  It finally reoccurred last night on one of the machines.  
>> 
>> It exhibits the same symptoms and the call traces look as they did previously.  This particular instance is configured with a deadline scheduler.  I was able to capture the inflight you requested:
>> 
>> $ cat /sys/block/xvd[abcde]/inflight
>>        0        0
>>        0        0
>>        0        0
>>        0        0
>>        0        0
>> 
>> I’ve had this happen on instances with the deadline scheduler and the noop scheduler.  At this point, I have not had this happen on an instance that is noop and the raid filesystem (ext4) is mounted with nobarrier.  The instances with noop/nobarrier have not been running long enough for me to make any sort of conclusion that it works around the problem. Frankly, I’m not sure I understand the interaction between ext4 barriers and raid0 block flushes well enough to theorize whether it should or shouldn’t make a difference.
> 
> If nobarrier, ext4 doesn't send flush request.

So, could ext4’s flush request deadlock with an md_flush_request?  Do they share a mutex of some sort? Could one of them be failing to acquire a mutex and not handling it?

> 
>> Does any of this help with identifying the bug?  Is there anymore information I can get that would be useful?  
> 
> 
> Unfortunately I can't find anything fishing. Does the xcdx disk correctly
> handle flush request? For example, you can do the same test with a single such
> disk and check if anything wrong.

Until recently, we had a number of these systems setup without raid0.  This issue never occurred on those systems.  Unfortunately, I can’t find a way to make it happen other than stand a server up and let it run.

I suppose I could try a different filesystem and see if that makes a difference (maybe ext3, xfs, etc).


> 
> Thanks,
> Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux