Hi, As long as you blacklist the old owner by ip, you should be fine. Do note that rbd lock remove implicitly also blacklists unless you also pass rbd lock remove the --rbd_blacklist_on_break_lock=false option. (that is I think "ceph osd blacklist add a.b.c.d interval" translates into blacklisting a.b.c.d:0/0 - which should block every client with source ip a.b.c.d). Regardless, I believe the client taking out the lock (rbd cli) and the kernel client mapping the rbd will be different (port, nonce), so specifically if it is possible to blacklist a specific client by (ip, port, nonce) it wouldn't do you much good where you have different clients dealing with the locking and doing the actual IO/mapping (rbd cli and kernel). We do a variation of what you are suggesting, although additionally we check for watches, if watched we give up and complain rather than blacklist. If previous lock were held by my ip we just silently reclaim. The hosts themselves run a process watching for blacklistentries, and if they see themselves blacklisted they commit suicide and re-boot. On boot, machine removes blacklist, reclaims any locks it used to hold before starting the things that might map rbd images. There's some warts in there, but for the most part it works well. If you are going the fencing route - I would strongly advise you also ensure your process don't end up with the possibility of cascading blacklists, in addition to being highly disruptive, it causes osd(?) map churn. (We accidentally did this - and ended up almost running our monitors out of disk). Cheers, KJ On Wed, Apr 19, 2017 at 2:35 AM, Chaofan Yu <chaofanyu@xxxxxxxxxxx> wrote: > Hi list, > > I wonder someone can help with rbd kernel client fencing (aimed to avoid > simultaneously rbd map on different hosts). > > I know the exclusive rbd image feature is added later to avoid manual rbd > lock CLIs. But want to know previous blacklist solution. > > The official workflow I’ve got is listed below (without exclusive rbd > feature) : > > - identify old rbd lock holder (rbd lock list <img>) > - blacklist old owner (ceph osd blacklist add <addr>) > - break old rbd lock (rbd lock remove <img> <lockid> <addr>) > - lock rbd image on new host (rbd lock add <img> <lockid>) > - map rbd image on new host > > > The blacklisted entry identified by entity_addr_t (ip, port, nonce). > > However as far as I know, ceph kernel client will do socket reconnection if > connection failed. So I wonder in this scenario it won’t work: > > 1. old client network down for a while > 2. perform below steps on new host to achieve failover > - identify old rbd lock holder (rbd lock list <img>) > > - blacklist old owner (ceph osd blacklist add <addr>) > - break old rbd lock (rbd lock remove <img> <lockid> <addr>) > - lock rbd image on new host (rbd lock add <img> <lockid>) > - map rbd image on new host > > 3. old client network come back and reconnect to osds with new created > socket client, i.e. new (ip, port,nonce) turple > > as a result both new and old client can write to same rbd image, which might > potentially cause the data corruption. > > So does this mean if kernel client does not support exclusive-lock image > feature, fencing is not possible ? > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Kjetil Joergensen <kjetil@xxxxxxxxxxxx> SRE, Medallia Inc Phone: +1 (650) 739-6580 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com