Hi, On Wed, Apr 19, 2017 at 9:08 PM, Chaofan Yu <chaofanyu@xxxxxxxxxxx> wrote: > Thank you so much. > > The blacklist entries are stored in osd map, which is supposed to be tiny and clean. > So we are doing similar cleanups after reboot. In the face of churn - this won't necessarily matter as I believe there's some osdmap history stored. It'll eventually fall off. This may also have improved, my bad experience were from around hammer. > I’m quite interested in how the host commit suicide and reboot, echo b >/proc/sysrq-trigger # This is about as brutal as it gets The machine is blacklisted, it has no hope of reading/writing anything from/to a rbd device. There's a couple of caveats that come with this: - Your workload needs to structure it's writes in such a way that it can recover from this kind of failure. - You need to engineer your workload in such a way that it can tolerate a machine falling off the face of the earth. (I.e. combination of workload scheduler like mesos/aurora/kubernetes and some HA where necessary) > can you successfully umount the folder and unmap the rbd block device > > after it is blacklisted? > > I wonder whether the IO will hang and the umount process will stop at D state > > thus the host cannot be shutdown since it is waiting for the umount to finish No, see previous comment. > ============================== > > and now that cento 7.3 kernel support exclusive lock feature, > > could anyone give out new flow of failover ? This may not be what you think it is, see i.e.: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004857.html (And I can't really provide you with much more context, I've primarily registered that it isn't made for fencing image access. It's all about arbitrating modification, in support of i.e. object-map). > > Thanks. > > >> On 20 Apr 2017, at 6:31 AM, Kjetil Jørgensen <kjetil@xxxxxxxxxxxx> wrote: >> >> Hi, >> >> As long as you blacklist the old owner by ip, you should be fine. Do >> note that rbd lock remove implicitly also blacklists unless you also >> pass rbd lock remove the --rbd_blacklist_on_break_lock=false option. >> (that is I think "ceph osd blacklist add a.b.c.d interval" translates >> into blacklisting a.b.c.d:0/0 - which should block every client with >> source ip a.b.c.d). >> >> Regardless, I believe the client taking out the lock (rbd cli) and the >> kernel client mapping the rbd will be different (port, nonce), so >> specifically if it is possible to blacklist a specific client by (ip, >> port, nonce) it wouldn't do you much good where you have different >> clients dealing with the locking and doing the actual IO/mapping (rbd >> cli and kernel). >> >> We do a variation of what you are suggesting, although additionally we >> check for watches, if watched we give up and complain rather than >> blacklist. If previous lock were held by my ip we just silently >> reclaim. The hosts themselves run a process watching for >> blacklistentries, and if they see themselves blacklisted they commit >> suicide and re-boot. On boot, machine removes blacklist, reclaims any >> locks it used to hold before starting the things that might map rbd >> images. There's some warts in there, but for the most part it works >> well. >> >> If you are going the fencing route - I would strongly advise you also >> ensure your process don't end up with the possibility of cascading >> blacklists, in addition to being highly disruptive, it causes osd(?) >> map churn. (We accidentally did this - and ended up almost running our >> monitors out of disk). >> >> Cheers, >> KJ >> >> On Wed, Apr 19, 2017 at 2:35 AM, Chaofan Yu <chaofanyu@xxxxxxxxxxx> wrote: >>> Hi list, >>> >>> I wonder someone can help with rbd kernel client fencing (aimed to avoid >>> simultaneously rbd map on different hosts). >>> >>> I know the exclusive rbd image feature is added later to avoid manual rbd >>> lock CLIs. But want to know previous blacklist solution. >>> >>> The official workflow I’ve got is listed below (without exclusive rbd >>> feature) : >>> >>> - identify old rbd lock holder (rbd lock list <img>) >>> - blacklist old owner (ceph osd blacklist add <addr>) >>> - break old rbd lock (rbd lock remove <img> <lockid> <addr>) >>> - lock rbd image on new host (rbd lock add <img> <lockid>) >>> - map rbd image on new host >>> >>> >>> The blacklisted entry identified by entity_addr_t (ip, port, nonce). >>> >>> However as far as I know, ceph kernel client will do socket reconnection if >>> connection failed. So I wonder in this scenario it won’t work: >>> >>> 1. old client network down for a while >>> 2. perform below steps on new host to achieve failover >>> - identify old rbd lock holder (rbd lock list <img>) >>> >>> - blacklist old owner (ceph osd blacklist add <addr>) >>> - break old rbd lock (rbd lock remove <img> <lockid> <addr>) >>> - lock rbd image on new host (rbd lock add <img> <lockid>) >>> - map rbd image on new host >>> >>> 3. old client network come back and reconnect to osds with new created >>> socket client, i.e. new (ip, port,nonce) turple >>> >>> as a result both new and old client can write to same rbd image, which might >>> potentially cause the data corruption. >>> >>> So does this mean if kernel client does not support exclusive-lock image >>> feature, fencing is not possible ? >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> >> -- >> Kjetil Joergensen <kjetil@xxxxxxxxxxxx> >> SRE, Medallia Inc >> Phone: +1 (650) 739-6580 > -- Kjetil Joergensen <kjetil@xxxxxxxxxxxx> SRE, Medallia Inc Phone: +1 (650) 739-6580 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com