Re: Kernel RBD hang on OSD Failure

Tom Christensen <pavera@xxxxxxxxx> · Tue, 8 Dec 2015 03:44:12 -0700

We haven't submitted a ticket as we've just avoided using the kernel client.  We've periodically tried with various kernels and various versions of ceph over the last two years, but have just given up each time and reverted to using rbd-fuse, which although not super stable, at least doesn't hang the client box.  We find ourselves in the position now where for additional functionality we *need* an actual block device, so we have to find a kernel client that works.  I will certainly keep you posted and can produce the output you've requested.
I'd also be willing to run an early 4.5 version in our test environment.

On Tue, Dec 8, 2015 at 3:35 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
On Tue, Dec 8, 2015 at 10:57 AM, Tom Christensen <pavera@xxxxxxxxx> wrote:

> We aren't running NFS, but regularly use the kernel driver to map RBDs and

> mount filesystems in same.  We see very similar behavior across nearly all

> kernel versions we've tried.  In my experience only very few versions of the

> kernel driver survive any sort of crush map change/update while something is

> mapped.  In fact in the last 2 years I think I've only seen this work on 1

> kernel version unfortunately its badly out of date and we can't run it in

> our environment anymore, I think it was a 3.0 kernel version running on

> ubuntu 12.04.  We have just recently started trying to find a kernel that

> will survive OSD outages or changes to the cluster.  We're on ubuntu 14.04,

> and have tried 3.16, 3.19.0-25, 4.3, and 4.2 without success in the last

> week.  We only map 1-3 RBDs per client machine at a time but we regularly

> will get processes stuck in D state which are accessing the filesystem

> inside the RBD and will have to hard reboot the RBD client machine.  This is

> always associated with a cluster change in some way, reweighting OSDs,

> rebooting an OSD host, restarting an individual OSD, adding OSDs, and

> removing OSDs all cause the kernel client to hang.  If no change is made to

> the cluster, the kernel client will be happy for weeks.

There are a couple of known bugs in the remap/resubmit area, but those

are supposedly corner cases (like *all* the OSDs going down and then

back up, etc).  I had no idea it was that severe and goes that back.

Apparently triggering it requires a heavier load, as we've never seen

anything like that in our tests.

For unrelated reasons, remap/resubmit code is getting entirely

rewritten for kernel 4.5, so, if you've been dealing with this issue

for the last two years (I don't remember seeing any tickets listing

that many kernel versions and not mentioning NFS), I'm afraid the best

course of action for you would be to wait for 4.5 to come out and try

it.  If you'd be willing to test out an early version on one of more of

your client boxes, I can ping you when it's ready.

I'll take a look at 3.0 vs 3.16 with an eye on remap code.  Did you

happen to try 3.10?

It sounds like you can reproduce this pretty easily.  Can you get it to

lock up and do:

# cat /sys/kernel/debug/ceph/*/osdmap

# cat /sys/kernel/debug/ceph/*/osdc

$ ceph status

and bunch of times?  I have a hunch that kernel client simply fails to

request enough of new osdmaps after the cluster topology changes under

load.

Thanks,

                Ilya

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com