Re: blocked i/o on rbd device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Are you exporting (or mounting) the NFS as async or sync?

How much memory does the server have?

Jan


> On 02 Mar 2016, at 12:54, Shinobu Kinjo <skinjo@xxxxxxxxxx> wrote:
> 
> Ilya,
> 
>> We've recently fixed two major long-standing bugs in this area.
> 
> If you could elaborate more, it would be reasonable for the community.
> Is there any pointer?
> 
> Cheers,
> Shinobu 
> 
> ----- Original Message -----
> From: "Ilya Dryomov" <idryomov@xxxxxxxxx>
> To: "Randy Orr" <randy.orr@xxxxxxxxxx>
> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Wednesday, March 2, 2016 8:40:42 PM
> Subject: Re:  blocked i/o on rbd device
> 
> On Tue, Mar 1, 2016 at 10:57 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
>> Hello,
>> 
>> I am running the following:
>> 
>> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>> ubuntu 14.04 with kernel 3.19.0-49-generic #55~14.04.1-Ubuntu SMP
>> 
>> For this use case I am mapping and mounting an rbd using the kernel client
>> and exporting the ext4 filesystem via NFS to a number of clients.
>> 
>> Once or twice a week we've seen disk io "stuck" or "blocked" on the rbd
>> device. When this happens iostat shows avgqu-sz at a constant number with
>> utilization at 100%. All i/o operations via NFS blocks, though I am able to
>> traverse the filesystem locally on the nfs server and read/write data. If I
>> wait long enough the device will eventually recover and avgqu-sz goes to
>> zero.
>> 
>> The only issue I could find that was similar to this is:
>> http://tracker.ceph.com/issues/8818 - However, I am not seeing the error
>> messages described and I am running a more recent version of the kernel that
>> should contain the fix from that issue. So, I assume this is likely a
>> different problem.
>> 
>> The ceph cluster reports as healthy the entire time, all pgs up and in,
>> there was no scrubbing going on, no osd failures or anything like that.
>> 
>> I ran echo t > /proc/sysrq-trigger and the output is here:
>> https://gist.github.com/anonymous/89c305443080149e9f45
>> 
>> Any ideas on what could be going on here? Any additional information I can
>> provide?
> 
> Hi Randy,
> 
> We've recently fixed two major long-standing bugs in this area.
> Currently, the only kernel that has fixes for both is 4.5-rc6, but
> backports are on their way - both patches will be 4.4.4.  I'll make
> sure those patches are queued for the ubuntu 3.19 kernel as well, but
> it'll take some time for them to land.
> 
> Could you try either 4.5-rc6 or 4.4.4 after it comes out?  It's likely
> that your problem is fixed.
> 
> Thanks,
> 
>                Ilya
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux