Re: rbd command hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 24, 2022 at 3:57 PM Sopena Ballesteros Manuel
<manuel.sopena@xxxxxxx> wrote:
>
> Dear ceph user community,
>
>
> I am trying to install and configure a node with a ceph cluster. The linux kernel we have does not include the rbd kernel module, hence we installed if ourselves:
>
>
> zypper install -y ceph-common > 15
> zypper install -y kernel-source = 5.3.18-24.75_10.0.189_2.1_20.4__g0388af5bc3.shasta
> cp /boot/config-5.3.18-24.75_10.0.189-cray_shasta_c /usr/src/linux/.config
> chown root:root /usr/src/linux/.config
> chown 0644 /usr/src/linux/.config
> cd /usr/src/linux
> sed -i 's/^# CONFIG_BLK_DEV_RBD is not set/CONFIG_BLK_DEV_RBD=m/g' .config && echo 'CONFIG_TCM_RBD=m' >> .config
> make drivers/block/rbd.ko
> cp /usr/src/linux/drivers/block/rbd.ko /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko
> chown root:root /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko
> chown 0644 /lib/modules/5.3.18-24.75_10.0.189-cray_shasta_c/extra/rbd.ko
>
>
> My issue is that rbd command sometimes hangs and we don't know why, this does not occur all the time but quite frequently. I google bit but could not find any relevant solution so I am looking for advice.
>
>
> What could cause rbd command to hang?

Hi Manuel,

Did you check if the RBD device gets mapped anyway?  If the mapping
succeeds despite the hang, it is probably hanging waiting for udev to
do its job.  It could be somehow related to the stripped down kernel
you are using or, if you are running "rbd map" from a container, there
may be issues with netlink event propagation.  Try "noudev" mapping
option:

$ rbd map -o noudev noir-nvme-meta/nid001388

>
>
> Below is an strace of when we try to run an rbd command:
>
>
> nid001388:~ # strace rbd -n client.noir map noir-nvme-meta/nid001388
> execve("/usr/bin/rbd", ["rbd", "-n", "client.noir", "map", "noir-nvme-meta/nid001388"], 0x7ffe8c35b7b0 /* 62 vars */) = 0
>
> [...]
>
> add_key("ceph", "client.noir", "--REDACTED--", 28, KEY_SPEC_PROCESS_KEYRING) = 201147173
> access("/run/udev/control", F_OK)       = 0
> socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, NETLINK_KOBJECT_UEVENT) = 3
> setsockopt(3, SOL_SOCKET, SO_RCVBUFFORCE, [1048576], 4) = 0
> setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, {len=13, filter=0x7ffd2f2179c0}, 16) = 0
> bind(3, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000002}, 12) = 0
> getsockname(3, {sa_family=AF_NETLINK, nl_pid=21421, nl_groups=0x000002}, [12]) = 0
> setsockopt(3, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
> pipe2([4, 5], O_NONBLOCK)               = 0
> mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7ff1e954f000
> mprotect(0x7ff1e9550000, 8388608, PROT_READ|PROT_WRITE) = 0
> clone(child_stack=0x7ff1e9d4a230, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[21425], tls=0x7ff1e9d4f700, child_tidptr=0x7ff1e9d4f9d0) = 21425
> poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, -1

This doesn't tell anything definitive as the actual mapping is done
from a thread.  Pass -f to strace to also trace child processes.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux