The systems on which the `rbd map` hangs problem occurred are definitely not under memory stress. I don't believer they are doing a lot of disk I/O either. Here's the basic set-up: * all nodes in the "data-plane" are identical * they each host and OSD instance, sharing one of the drive * I'm running Docker containers using an RBD volume plugin and Docker Compose * when the hang happens, the most visible behavior is that `docker ps` hangs * then I run `systemctl status` and see and `rbd map` process spawned by the RBD volume plugin * I then tried an `strace -f -p <pid of rbd map>` and that process promptly exits (with RC 0) and the hang resolves itself I'll tried to capture the strace output the next time I run into it and share with the mailing list. Thanks, Ilya. -kc > On May 9, 2016, at 2:21 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > On Mon, May 9, 2016 at 12:19 AM, K.C. Wong <kcwong@xxxxxxxxxxx> wrote: >> >>> As the tip said, you should not use rbd via kernel module on an OSD host >>> >>> However, using it with userspace code (librbd etc, as in kvm) is fine >>> >>> Generally, you should not have both: >>> - "server" in userspace >>> - "client" in kernelspace >> >> If `librbd` would help avoid this problem, then switch to `rbd-fuse` >> should do the trick, right? >> >> The reason for my line of question is that I've seen occasionl freeze >> up of `rbd map` that's resolved by a 'slight tap' by way of an strace. >> There is definitely great attractiveness to not have specialized nodes >> and make every one the same as the next one on the rack. > > The problem with placing the kernel client on the OSD node is the > potential deadlock under heavy I/O when memory becomes scarce. It's > not recommended, but people are doing it - if you don't stress your > system too much, it'll never happen. > > "rbd map" freeze is definitely not related to the abov. Did the actual > command hang? Could you describe what you saw in more detail and how > did strace help? It could be that you ran into > > http://tracker.ceph.com/issues/14737 > > Thanks, > > Ilya K.C. Wong kcwong@xxxxxxxxxxx 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com