It’ll take a day or two for it to happen again. When it does, I’ll pull the inflight stats. Anything else I should grab while I’m at it? Thanx, LES > On Feb 17, 2017, at 3:06 PM, Shaohua Li <shli@xxxxxxxxxx> wrote: > > On Fri, Feb 17, 2017 at 02:05:49PM -0500, Les Stroud wrote: >> >> I have a problem with processes entering an uninterruptible sleep state in md_flush_request and never returning. I having trouble identifying the underlying issue. I’m hoping someone on here may be able to help. >> >> The servers in question are running in aws (xen hvm) with kernel 3.8.13-118.16.2.el6uek.x86_64. The server has two mounts. The first is vanilla ext4. The second is a software RAID0 array, striped with 256k chunks, buiIt with md. It’s file system is ext4. >> >> The most immediately and obvious symptom of the issue are kernel errors “kernel: INFO task [some process]: blocked for more than 120 seconds.”. Shortly there after, other processes start entering the same uninterruptible wait state (D). This almost always impacts ssh logins. >> >> The problem does not occur when the system is under load, or was recently under load. It happens when the system is quiet (no cpu, very little io). > > This seems suggesting we have a missed blk-plug flush in light workload. Can > you check the output of /sys/block/disk-bame/inflight for both md and the > underlayer disks? This will let us know if there is IO pending. > Also it would be great if you can test a upstream kernel. > > Thanks, > Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html