Re: EBLOCKLISTED error after rbd map was interrupted by fatal signal

Ilya Dryomov <idryomov@xxxxxxxxx> · Wed, 22 Feb 2023 14:38:24 +0100

On Wed, Feb 22, 2023 at 1:17 PM Aleksandr Mikhalitsyn
<aleksandr.mikhalitsyn@xxxxxxxxxxxxx> wrote:
>
> Hi folks,
>
> Recently we've met a problem [1] with the kernel ceph client/rbd.
>
> Writing to /sys/bus/rbd/add_single_major in some cases can take a lot
> of time, so on the userspace side
> we had a timeout and sent a fatal signal to the rbd map process to
> interrupt the process.
> And this working perfectly well, but then it's impossible to perform
> rbd map again cause we are always getting EBLOCKLISTED error.

Hi Aleksandr,

I'm not sure if there is a causal relationship between "rbd map"
getting sent a fatal signal by LXC and these EBLOCKLISTED errors.  Are
you saying that that was confirmed to be the root cause, meaning that
no such errors were observed after [1] got merged?

>
> We've done some brief analysis of the kernel side.
>
> Kernelside call stack:
> sysfs_write [/sys/bus/rbd/add_single_major]
> add_single_major_store
> do_rbd_add
> rbd_add_acquire_lock
> rbd_acquire_lock
> rbd_try_acquire_lock <- EBLOCKLISTED comes from there for 2nd and
> further attempts
>
> Most probably the place at which it was interrupted by a signal:
> static int rbd_add_acquire_lock(struct rbd_device *rbd_dev)
> {
> ...
>
>         rbd_assert(!rbd_is_lock_owner(rbd_dev));
>         queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0);
>         ret = wait_for_completion_killable_timeout(&rbd_dev->acquire_wait,
>         ceph_timeout_jiffies(rbd_dev->opts->lock_timeout)); <=== signal arrives
>
> As far as I understand, we had been receiving the EBLOCKLISTED errno
> because ceph_monc_blocklist_add()
> sent the "osd blocklist add" command to the ceph monitor successfully.

RBD doesn't use ceph_monc_blocklist_add() to blocklist itself.  It's
there to blocklist some _other_ RBD client that happens to be holding
the lock and isn't responding to this RBD client's requests to release
it.

> We had removed the client from blocklist [2].

This is very dangerous and generally shouldn't ever be done.
Blocklisting is Ceph's term for fencing.  Manually lifting the fence
without fully understanding what is going on in the system is a fast
ticket to data corruption.

I see that [2] does say "Doing this may put data integrity at risk" but
not nearly as strong as it should.  Also, it's for CephFS, not RBD.

> But we still weren't able to perform the rbd map. It looks like some
> extra state is saved on the kernel client side and blocks us.

By default, all RBD mappings on the node share the same "RBD client"
instance.  Once it's blocklisted, all existing mappings mappings are
affected.  Unfortunately, new mappings don't check for that and just
attempt to reuse that instance as usual.

This sharing can be disabled by passing "-o noshare" to "rbd map" but
I would recommend cleaning up existing mappings instead.

Thanks,

                Ilya

>
> What do you think about it?
>
> Links:
> [1] https://github.com/lxc/lxd/pull/11213
> [2] https://docs.ceph.com/en/quincy/cephfs/eviction/#advanced-un-blocklisting-a-client
>
> Kind regards,
> Alex