Re: Process stuck in md_flush_request (state: D)

Shaohua Li <shli@xxxxxxxxxx> · Fri, 17 Feb 2017 12:06:44 -0800

On Fri, Feb 17, 2017 at 02:05:49PM -0500, Les Stroud wrote:
> 
> I have a problem with processes entering an uninterruptible sleep state in md_flush_request and never returning. I having trouble identifying the underlying issue. I’m hoping someone on here may be able to help.
> 
> The servers in question are running in aws (xen hvm) with kernel 3.8.13-118.16.2.el6uek.x86_64.  The server has two mounts.  The first is vanilla ext4.  The second is a software RAID0 array, striped with 256k chunks, buiIt with md.  It’s file system is ext4. 
> 
> The most immediately and obvious symptom of the issue are kernel errors “kernel: INFO task [some process]: blocked for more than 120 seconds.”.  Shortly there after, other processes start entering the same uninterruptible wait state (D). This almost always impacts ssh logins.
> 
> The problem does not occur when the system is under load, or was recently under load.  It happens when the system is quiet (no cpu, very little io).

This seems suggesting we have a missed blk-plug flush in light workload. Can
you check the output of /sys/block/disk-bame/inflight for both md and the
underlayer disks? This will let us know if there is IO pending.
Also it would be great if you can test a upstream kernel.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html