Ilya, > We've recently fixed two major long-standing bugs in this area. If you could elaborate more, it would be reasonable for the community. Is there any pointer? Cheers, Shinobu ----- Original Message ----- From: "Ilya Dryomov" <idryomov@xxxxxxxxx> To: "Randy Orr" <randy.orr@xxxxxxxxxx> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> Sent: Wednesday, March 2, 2016 8:40:42 PM Subject: Re: blocked i/o on rbd device On Tue, Mar 1, 2016 at 10:57 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote: > Hello, > > I am running the following: > > ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) > ubuntu 14.04 with kernel 3.19.0-49-generic #55~14.04.1-Ubuntu SMP > > For this use case I am mapping and mounting an rbd using the kernel client > and exporting the ext4 filesystem via NFS to a number of clients. > > Once or twice a week we've seen disk io "stuck" or "blocked" on the rbd > device. When this happens iostat shows avgqu-sz at a constant number with > utilization at 100%. All i/o operations via NFS blocks, though I am able to > traverse the filesystem locally on the nfs server and read/write data. If I > wait long enough the device will eventually recover and avgqu-sz goes to > zero. > > The only issue I could find that was similar to this is: > http://tracker.ceph.com/issues/8818 - However, I am not seeing the error > messages described and I am running a more recent version of the kernel that > should contain the fix from that issue. So, I assume this is likely a > different problem. > > The ceph cluster reports as healthy the entire time, all pgs up and in, > there was no scrubbing going on, no osd failures or anything like that. > > I ran echo t > /proc/sysrq-trigger and the output is here: > https://gist.github.com/anonymous/89c305443080149e9f45 > > Any ideas on what could be going on here? Any additional information I can > provide? Hi Randy, We've recently fixed two major long-standing bugs in this area. Currently, the only kernel that has fixes for both is 4.5-rc6, but backports are on their way - both patches will be 4.4.4. I'll make sure those patches are queued for the ubuntu 3.19 kernel as well, but it'll take some time for them to land. Could you try either 4.5-rc6 or 4.4.4 after it comes out? It's likely that your problem is fixed. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com