On Tue, May 24, 2022 at 3:57 PM Sopena Ballesteros Manuel <manuel.sopena@xxxxxxx> wrote: > > Dear ceph user community, > > > I am trying to install and configure a node with a ceph cluster. The linux kernel we have does not include the rbd kernel module, hence we installed if ourselves: > > > zypper install -y ceph-common > 15 > zypper install -y kernel-source = 5.3.18-24.75_10.0.189_2.1_20.4__g0388af5bc3.shasta > cp /boot/config-5.3.18-24.75_10.0.189-cray_shasta_c /usr/src/linux/.config > chown root:root /usr/src/linux/.config > chown 0644 /usr/src/linux/.config > cd /usr/src/linux > sed -i 's/^# CONFIG_BLK_DEV_RBD is not set/CONFIG_BLK_DEV_RBD=m/g' .config && echo 'CONFIG_TCM_RBD=m' >> .config > make drivers/block/rbd.ko > cp /usr/src/linux/drivers/block/rbd.ko /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko > chown root:root /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko > chown 0644 /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko > > > My issue is that rbd command sometimes hangs and we don't know why, this does not occur all the time but quite frequently. I google bit but could not find any relevant solution so I am looking for advice. > > > What could cause rbd command to hang? Hi Manuel, Did you check if the RBD device gets mapped anyway? If the mapping succeeds despite the hang, it is probably hanging waiting for udev to do its job. It could be somehow related to the stripped down kernel you are using or, if you are running "rbd map" from a container, there may be issues with netlink event propagation. Try "noudev" mapping option: $ rbd map -o noudev noir-nvme-meta/nid001388 > > > Below is an strace of when we try to run an rbd command: > > > nid001388:~ # strace rbd -n client.noir map noir-nvme-meta/nid001388 > execve("/usr/bin/rbd", ["rbd", "-n", "client.noir", "map", "noir-nvme-meta/nid001388"], 0x7ffe8c35b7b0 /* 62 vars */) = 0 > > [...] > > add_key("ceph", "client.noir", "--REDACTED--", 28, KEY_SPEC_PROCESS_KEYRING) = 201147173 > access("/run/udev/control", F_OK) = 0 > socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, NETLINK_KOBJECT_UEVENT) = 3 > setsockopt(3, SOL_SOCKET, SO_RCVBUFFORCE, [1048576], 4) = 0 > setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, {len=13, filter=0x7ffd2f2179c0}, 16) = 0 > bind(3, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000002}, 12) = 0 > getsockname(3, {sa_family=AF_NETLINK, nl_pid=21421, nl_groups=0x000002}, [12]) = 0 > setsockopt(3, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 > pipe2([4, 5], O_NONBLOCK) = 0 > mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7ff1e954f000 > mprotect(0x7ff1e9550000, 8388608, PROT_READ|PROT_WRITE) = 0 > clone(child_stack=0x7ff1e9d4a230, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[21425], tls=0x7ff1e9d4f700, child_tidptr=0x7ff1e9d4f9d0) = 21425 > poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, -1 This doesn't tell anything definitive as the actual mapping is done from a thread. Pass -f to strace to also trace child processes. Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx