Re: Process stuck in md_flush_request (state: D)

Les Stroud <les@xxxxxxxxxxxxx> · Fri, 17 Feb 2017 15:40:03 -0500

It’ll take a day or two for it to happen again.  When it does, I’ll pull the inflight stats.  Anything else I should grab while I’m at it?

Thanx,
LES

> On Feb 17, 2017, at 3:06 PM, Shaohua Li <shli@xxxxxxxxxx> wrote:
> 
> On Fri, Feb 17, 2017 at 02:05:49PM -0500, Les Stroud wrote:
>> 
>> I have a problem with processes entering an uninterruptible sleep state in md_flush_request and never returning. I having trouble identifying the underlying issue. I’m hoping someone on here may be able to help.
>> 
>> The servers in question are running in aws (xen hvm) with kernel 3.8.13-118.16.2.el6uek.x86_64.  The server has two mounts.  The first is vanilla ext4.  The second is a software RAID0 array, striped with 256k chunks, buiIt with md.  It’s file system is ext4. 
>> 
>> The most immediately and obvious symptom of the issue are kernel errors “kernel: INFO task [some process]: blocked for more than 120 seconds.”.  Shortly there after, other processes start entering the same uninterruptible wait state (D). This almost always impacts ssh logins.
>> 
>> The problem does not occur when the system is under load, or was recently under load.  It happens when the system is quiet (no cpu, very little io).
> 
> This seems suggesting we have a missed blk-plug flush in light workload. Can
> you check the output of /sys/block/disk-bame/inflight for both md and the
> underlayer disks? This will let us know if there is IO pending.
> Also it would be great if you can test a upstream kernel.
> 
> Thanks,
> Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html