On Sun, Sep 9, 2018 at 6:31 AM David Turner <drakonstein@xxxxxxxxx> wrote: > > The problem is with the kernel pagecache. If that is still shared in a containerized environment with the OSDs in containers and RBDs which are married on The node outside of containers, then it is indeed still a problem. I would guess that's the case, but I do not know for certain. Using rbd-nbd instead of krbd bypasses this problem and you can ignore it. Only using krbd is problematic. How is the nbd client in the kernel different from the rbd client in the kernel (i.e. krbd)? They are both network block devices, the only difference is that the latter talks directly to the OSDs while the former has to go through a proxy. It'll be the same kernel either way if you choose to co-locate, so I don't think using rbd-nbd bypasses this problem. On the contrary, there is more opportunity for breakage with an additional daemon in the I/O path. The kernel is evolving and I haven't seen a report of such a deadlock in quite a while. I think it's still there, but it's probably harder to hit than it used to be. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com