Re: blocked i/o on rbd device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ilya,

> We've recently fixed two major long-standing bugs in this area.

If you could elaborate more, it would be reasonable for the community.
Is there any pointer?

Cheers,
Shinobu 

----- Original Message -----
From: "Ilya Dryomov" <idryomov@xxxxxxxxx>
To: "Randy Orr" <randy.orr@xxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Wednesday, March 2, 2016 8:40:42 PM
Subject: Re:  blocked i/o on rbd device

On Tue, Mar 1, 2016 at 10:57 PM, Randy Orr <randy.orr@xxxxxxxxxx> wrote:
> Hello,
>
> I am running the following:
>
> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> ubuntu 14.04 with kernel 3.19.0-49-generic #55~14.04.1-Ubuntu SMP
>
> For this use case I am mapping and mounting an rbd using the kernel client
> and exporting the ext4 filesystem via NFS to a number of clients.
>
> Once or twice a week we've seen disk io "stuck" or "blocked" on the rbd
> device. When this happens iostat shows avgqu-sz at a constant number with
> utilization at 100%. All i/o operations via NFS blocks, though I am able to
> traverse the filesystem locally on the nfs server and read/write data. If I
> wait long enough the device will eventually recover and avgqu-sz goes to
> zero.
>
> The only issue I could find that was similar to this is:
> http://tracker.ceph.com/issues/8818 - However, I am not seeing the error
> messages described and I am running a more recent version of the kernel that
> should contain the fix from that issue. So, I assume this is likely a
> different problem.
>
> The ceph cluster reports as healthy the entire time, all pgs up and in,
> there was no scrubbing going on, no osd failures or anything like that.
>
> I ran echo t > /proc/sysrq-trigger and the output is here:
> https://gist.github.com/anonymous/89c305443080149e9f45
>
>  Any ideas on what could be going on here? Any additional information I can
> provide?

Hi Randy,

We've recently fixed two major long-standing bugs in this area.
Currently, the only kernel that has fixes for both is 4.5-rc6, but
backports are on their way - both patches will be 4.4.4.  I'll make
sure those patches are queued for the ubuntu 3.19 kernel as well, but
it'll take some time for them to land.

Could you try either 4.5-rc6 or 4.4.4 after it comes out?  It's likely
that your problem is fixed.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux