Re: Kernel RBD hang on OSD Failure

Tom Christensen <pavera@xxxxxxxxx> · Tue, 8 Dec 2015 02:57:43 -0700

We aren't running NFS, but regularly use the kernel driver to map RBDs and mount filesystems in same.  We see very similar behavior across nearly all kernel versions we've tried.  In my experience only very few versions of the kernel driver survive any sort of crush map change/update while something is mapped.  In fact in the last 2 years I think I've only seen this work on 1 kernel version unfortunately its badly out of date and we can't run it in our environment anymore, I think it was a 3.0 kernel version running on ubuntu 12.04.  We have just recently started trying to find a kernel that will survive OSD outages or changes to the cluster.  We're on ubuntu 14.04, and have tried 3.16, 3.19.0-25, 4.3, and 4.2 without success in the last week.  We only map 1-3 RBDs per client machine at a time but we regularly will get processes stuck in D state which are accessing the filesystem inside the RBD and will have to hard reboot the RBD client machine.  This is always associated with a cluster change in some way, reweighting OSDs, rebooting an OSD host, restarting an individual OSD, adding OSDs, and removing OSDs all cause the kernel client to hang.  If no change is made to the cluster, the kernel client will be happy for weeks.

On Mon, Dec 7, 2015 at 2:55 PM, Blair Bethwaite <blair.bethwaite@xxxxxxxxx> wrote:
Hi Matt,

(CC'ing in ceph-users too - similar reports there:

http://www.spinics.net/lists/ceph-users/msg23037.html)

We've seen something similar for KVM [lib]RBD clients acting as NFS

gateways within our OpenStack cloud, the NFS services were locking up

and causing client timeouts whenever we started doing Ceph

maintenance. We eventually realised we'd somehow set the pool min_size

== size, so any single OSD outage was blocking client IO - *oops*.

Your issue sounds like something different, but NFS does seem to be

very touchy and lacking any graceful recovery from issues with the

underlying FS.

On 8 December 2015 at 07:56, Matt Conner <matt.conner@xxxxxxxxxxxxxx> wrote:

> Hi,

>

> We have a Ceph cluster in which we have been having issues with RBD

> clients hanging when an OSD failure occurs. We are using a NAS gateway

> server which maps RBD images to filesystems and serves the filesystems

> out via NFS. The gateway server has close to 180 NFS clients and

> almost every time even 1 OSD goes down during heavy load, the NFS

> exports lock up and the clients are unable to access the NAS share via

> NFS. When the OSD fails, Ceph recovers without issue, but the gateway

> kernel RBD module appears to get stuck waiting on the now failed OSD.

> Note that this works correctly when under lighter loads.

>

> From what we have been able to determine, the NFS server daemon hangs

> waiting for I/O from the OSD that went out and never recovers.

> Similarly, attempting to access files from the exported FS locally on

> the gateway server will result in a similar hang. We also noticed that

> Ceph health details will continue to report blocked I/O on the now

> down OSD until either the OSD is recovered or the gateway server is

> rebooted.  Based on a few kernel logs from NFS and PVS, we were able

> to trace the problem to the RBD kernel module.

>

> Unfortunately, the only way we have been able to recover our gateway

> is by hard rebooting the server.

>

> Has anyone else encountered this issue and/or have a possible solution?

> Are there suggestions for getting more detailed debugging information

> from the RBD kernel module?

>

>

> Few notes on our setup:

> We are using Kernel RBD on a gateway server that exports filesystems via NFS

> The exported filesystems are XFS on LVMs which are each composed of 16

> striped images (NFS->LVM->XFS->PVS->RBD)

> There are currently 176 mapped RBD images on the server (11

> filesystems, 16 mapped RBD images per FS)

> Gateway Kernel: 3.18.6

> Ceph version: 0.80.9

> Note - We've tried using different kernels all the way up to 4.3.0 but

> the problem persists.

>

> Thanks,

> Matt Conner

> Keeper Technology

> --

> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in

> the body of a message to majordomo@xxxxxxxxxxxxxxx

> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--

Cheers,

~Blairo

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com