Re: rbd kernel client fencing

Kjetil Jørgensen <kjetil@xxxxxxxxxxxx> · Wed, 19 Apr 2017 15:31:34 -0700

Hi,

As long as you blacklist the old owner by ip, you should be fine. Do
note that rbd lock remove implicitly also blacklists unless you also
pass rbd lock remove the --rbd_blacklist_on_break_lock=false option.
(that is I think "ceph osd blacklist add a.b.c.d interval" translates
into blacklisting a.b.c.d:0/0 - which should block every client with
source ip a.b.c.d).

Regardless, I believe the client taking out the lock (rbd cli) and the
kernel client mapping the rbd will be different (port, nonce), so
specifically if it is possible to blacklist a specific client by (ip,
port, nonce) it wouldn't do you much good where you have different
clients dealing with the locking and doing the actual IO/mapping (rbd
cli and kernel).

We do a variation of what you are suggesting, although additionally we
check for watches, if watched we give up and complain rather than
blacklist. If previous lock were held by my ip we just silently
reclaim. The hosts themselves run a process watching for
blacklistentries, and if they see themselves blacklisted they commit
suicide and re-boot. On boot, machine removes blacklist, reclaims any
locks it used to hold before starting the things that might map rbd
images. There's some warts in there, but for the most part it works
well.

If you are going the fencing route - I would strongly advise you also
ensure your process don't end up with the possibility of cascading
blacklists, in addition to being highly disruptive, it causes osd(?)
map churn. (We accidentally did this - and ended up almost running our
monitors out of disk).

Cheers,
KJ

On Wed, Apr 19, 2017 at 2:35 AM, Chaofan Yu <chaofanyu@xxxxxxxxxxx> wrote:
> Hi list,
>
>   I wonder someone can help with rbd kernel client fencing (aimed to avoid
> simultaneously rbd map on different hosts).
>
> I know the exclusive rbd image feature is added later to avoid manual rbd
> lock CLIs. But want to know previous blacklist solution.
>
> The official workflow I’ve got is listed below (without exclusive rbd
> feature) :
>
>  - identify old rbd lock holder (rbd lock list <img>)
>  - blacklist old owner (ceph osd blacklist add <addr>)
>  - break old rbd lock (rbd lock remove <img> <lockid> <addr>)
>  - lock rbd image on new host (rbd lock add <img> <lockid>)
>  - map rbd image on new host
>
>
> The blacklisted entry identified by entity_addr_t (ip, port, nonce).
>
> However as far as I know, ceph kernel client will do socket reconnection if
> connection failed. So I wonder in this scenario it won’t work:
>
> 1. old client network down for a while
> 2. perform below steps on new host to achieve failover
> - identify old rbd lock holder (rbd lock list <img>)
>
>  - blacklist old owner (ceph osd blacklist add <addr>)
>  - break old rbd lock (rbd lock remove <img> <lockid> <addr>)
>  - lock rbd image on new host (rbd lock add <img> <lockid>)
>  - map rbd image on new host
>
> 3. old client network come back and reconnect to osds with new created
> socket client, i.e. new (ip, port,nonce) turple
>
> as a result both new and old client can write to same rbd image, which might
> potentially cause the data corruption.
>
> So does this mean if kernel client does not support exclusive-lock image
> feature, fencing is not possible ?
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Kjetil Joergensen <kjetil@xxxxxxxxxxxx>
SRE, Medallia Inc
Phone: +1 (650) 739-6580
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com