On Tue, Dec 8, 2015 at 10:57 AM, Tom Christensen <pavera@xxxxxxxxx> wrote: > We aren't running NFS, but regularly use the kernel driver to map RBDs and > mount filesystems in same. We see very similar behavior across nearly all > kernel versions we've tried. In my experience only very few versions of the > kernel driver survive any sort of crush map change/update while something is > mapped. In fact in the last 2 years I think I've only seen this work on 1 > kernel version unfortunately its badly out of date and we can't run it in > our environment anymore, I think it was a 3.0 kernel version running on > ubuntu 12.04. We have just recently started trying to find a kernel that > will survive OSD outages or changes to the cluster. We're on ubuntu 14.04, > and have tried 3.16, 3.19.0-25, 4.3, and 4.2 without success in the last > week. We only map 1-3 RBDs per client machine at a time but we regularly > will get processes stuck in D state which are accessing the filesystem > inside the RBD and will have to hard reboot the RBD client machine. This is > always associated with a cluster change in some way, reweighting OSDs, > rebooting an OSD host, restarting an individual OSD, adding OSDs, and > removing OSDs all cause the kernel client to hang. If no change is made to > the cluster, the kernel client will be happy for weeks. There are a couple of known bugs in the remap/resubmit area, but those are supposedly corner cases (like *all* the OSDs going down and then back up, etc). I had no idea it was that severe and goes that back. Apparently triggering it requires a heavier load, as we've never seen anything like that in our tests. For unrelated reasons, remap/resubmit code is getting entirely rewritten for kernel 4.5, so, if you've been dealing with this issue for the last two years (I don't remember seeing any tickets listing that many kernel versions and not mentioning NFS), I'm afraid the best course of action for you would be to wait for 4.5 to come out and try it. If you'd be willing to test out an early version on one of more of your client boxes, I can ping you when it's ready. I'll take a look at 3.0 vs 3.16 with an eye on remap code. Did you happen to try 3.10? It sounds like you can reproduce this pretty easily. Can you get it to lock up and do: # cat /sys/kernel/debug/ceph/*/osdmap # cat /sys/kernel/debug/ceph/*/osdc $ ceph status and bunch of times? I have a hunch that kernel client simply fails to request enough of new osdmaps after the cluster topology changes under load. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html