On Wed, Feb 22, 2023 at 2:38 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > On Wed, Feb 22, 2023 at 1:17 PM Aleksandr Mikhalitsyn > <aleksandr.mikhalitsyn@xxxxxxxxxxxxx> wrote: > > > > Hi folks, > > > > Recently we've met a problem [1] with the kernel ceph client/rbd. > > > > Writing to /sys/bus/rbd/add_single_major in some cases can take a lot > > of time, so on the userspace side > > we had a timeout and sent a fatal signal to the rbd map process to > > interrupt the process. > > And this working perfectly well, but then it's impossible to perform > > rbd map again cause we are always getting EBLOCKLISTED error. > > Hi Aleksandr, Hi Ilya! Thanks a lot for such a fast reply. > > I'm not sure if there is a causal relationship between "rbd map" > getting sent a fatal signal by LXC and these EBLOCKLISTED errors. Are > you saying that that was confirmed to be the root cause, meaning that > no such errors were observed after [1] got merged? AFAIK, no. After [1] was merged we haven't seen any issues with rbd. I think Stephane will correct me if I'm wrong. I also can't be fully sure that there is a strict logical relationship between EBLOCKLISTED error and fatal signal. After I got a report from LXD folks about this I've tried to analyse kernel code and find the places where EBLOCKLISTED (ESHUTDOWN|EBLOCKLISTED|EBLACKLISTED) can be sent to the userspace. I was surprised that there are no places in the kernel ceph/rbd client where we can throw this error, it can only be received from ceph monitor as a reply to a kernel client request. But we have a lot of checks like this: if (rc == -EBLOCKLISTED) fsc->blocklisted = true; so, if we receive this error once then it will be saved in struct ceph_fs_client without any chance to clear it. Maybe this is the reason why all "rbd map" attempts are failing?.. > > > > > We've done some brief analysis of the kernel side. > > > > Kernelside call stack: > > sysfs_write [/sys/bus/rbd/add_single_major] > > add_single_major_store > > do_rbd_add > > rbd_add_acquire_lock > > rbd_acquire_lock > > rbd_try_acquire_lock <- EBLOCKLISTED comes from there for 2nd and > > further attempts > > > > Most probably the place at which it was interrupted by a signal: > > static int rbd_add_acquire_lock(struct rbd_device *rbd_dev) > > { > > ... > > > > rbd_assert(!rbd_is_lock_owner(rbd_dev)); > > queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); > > ret = wait_for_completion_killable_timeout(&rbd_dev->acquire_wait, > > ceph_timeout_jiffies(rbd_dev->opts->lock_timeout)); <=== signal arrives > > > > As far as I understand, we had been receiving the EBLOCKLISTED errno > > because ceph_monc_blocklist_add() > > sent the "osd blocklist add" command to the ceph monitor successfully. > > RBD doesn't use ceph_monc_blocklist_add() to blocklist itself. It's > there to blocklist some _other_ RBD client that happens to be holding > the lock and isn't responding to this RBD client's requests to release > it. Got it. Thanks for clarifying this. > > > We had removed the client from blocklist [2]. > > This is very dangerous and generally shouldn't ever be done. > Blocklisting is Ceph's term for fencing. Manually lifting the fence > without fully understanding what is going on in the system is a fast > ticket to data corruption. > > I see that [2] does say "Doing this may put data integrity at risk" but > not nearly as strong as it should. Also, it's for CephFS, not RBD. > > > But we still weren't able to perform the rbd map. It looks like some > > extra state is saved on the kernel client side and blocks us. > > By default, all RBD mappings on the node share the same "RBD client" > instance. Once it's blocklisted, all existing mappings mappings are > affected. Unfortunately, new mappings don't check for that and just > attempt to reuse that instance as usual. > > This sharing can be disabled by passing "-o noshare" to "rbd map" but > I would recommend cleaning up existing mappings instead. So, we need to execute (on a client node): $ rbd showmapped and then $ rbd unmap ... for each mapping, correct? > > Thanks, > > Ilya > > > > > What do you think about it? > > > > Links: > > [1] https://github.com/lxc/lxd/pull/11213 > > [2] https://docs.ceph.com/en/quincy/cephfs/eviction/#advanced-un-blocklisting-a-client > > > > Kind regards, > > Alex