Are you exporting (or mounting) the NFS as async or sync? How much memory does the server have? Jan > On 02 Mar 2016, at 12:54, Shinobu Kinjo <skinjo@xxxxxxxxxx> wrote: > > Ilya, > >> We've recently fixed two major long-standing bugs in this area. > > If you could elaborate more, it would be reasonable for the community. > Is there any pointer? > > Cheers, > Shinobu > > ----- Original Message ----- > From: "Ilya Dryomov" <idryomov@xxxxxxxxx> > To: "Randy Orr" <randy.orr@xxxxxxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Sent: Wednesday, March 2, 2016 8:40:42 PM > Subject: Re: blocked i/o on rbd device > > On Tue, Mar 1, 2016 at 10:57 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote: >> Hello, >> >> I am running the following: >> >> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) >> ubuntu 14.04 with kernel 3.19.0-49-generic #55~14.04.1-Ubuntu SMP >> >> For this use case I am mapping and mounting an rbd using the kernel client >> and exporting the ext4 filesystem via NFS to a number of clients. >> >> Once or twice a week we've seen disk io "stuck" or "blocked" on the rbd >> device. When this happens iostat shows avgqu-sz at a constant number with >> utilization at 100%. All i/o operations via NFS blocks, though I am able to >> traverse the filesystem locally on the nfs server and read/write data. If I >> wait long enough the device will eventually recover and avgqu-sz goes to >> zero. >> >> The only issue I could find that was similar to this is: >> http://tracker.ceph.com/issues/8818 - However, I am not seeing the error >> messages described and I am running a more recent version of the kernel that >> should contain the fix from that issue. So, I assume this is likely a >> different problem. >> >> The ceph cluster reports as healthy the entire time, all pgs up and in, >> there was no scrubbing going on, no osd failures or anything like that. >> >> I ran echo t > /proc/sysrq-trigger and the output is here: >> https://gist.github.com/anonymous/89c305443080149e9f45 >> >> Any ideas on what could be going on here? Any additional information I can >> provide? > > Hi Randy, > > We've recently fixed two major long-standing bugs in this area. > Currently, the only kernel that has fixes for both is 4.5-rc6, but > backports are on their way - both patches will be 4.4.4. I'll make > sure those patches are queued for the ubuntu 3.19 kernel as well, but > it'll take some time for them to land. > > Could you try either 4.5-rc6 or 4.4.4 after it comes out? It's likely > that your problem is fixed. > > Thanks, > > Ilya > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com