We've had some user reports lately on rbd images being broken by misbehaving clients — namely, rbd image I is mounted on computer A, computer A starts misbehaving, and so I is mounted on computer B. But because A is misbehaving it keeps writing to the image, corrupting it horribly. To handle this, we're working on two separate but related features: 1) Advisory RBD image locking. See http://tracker.newdream.net/issues/1480 and the wip-rbd-locking branch. With this addition clients gain the ability to do shared and exclusive locking of images, which protects against accidentally mounting a disk in two places at once — but because the images are distributed across all OSDs this is of course entirely advisory, and a misbehaving client is still perfectly capable of writing to a disk it shouldn't. To handle that, we're also looking at... 2) Client fencing. See http://tracker.newdream.net/issues/2531. There is an existing "blacklist" functionality in the OSDs/OSDMap, where you can specify an "entity_addr_t" (consisting of an IP, a port, and a nonce — so essentially unique per-process) which is not allowed to communicate with the cluster any longer. The problem with this is that since it's distributed as part of the OSDMap (via gossip), then if it's important to have a point-in-time transition (as with an rbd image), the new client needs every OSD to update their map before it starts doing any reads or writes. The initial idea in the bug was to have some sort of command you could run on a per-image basis, which breaks the locks and does the blacklist for the old locker — but if the problem is a misbehaving hypervisor, then you may have to run that for several hundred images, where each command needs to talk to several hundred OSDs. That's super-lame and nobody wants to do it. The alternative is making an admin/script do it on their own using the existing "ceph osd blacklist" and about-to-exist "rbd lock break" functionality as appropriate. That's also super-lame, because then they have to come up with some way of spreading the map, and it's difficult to embed in external libraries. So what I'm currently (as of 15 seconds ago) leaning towards is a new rados command which will do the blacklist and make sure the new map is distributed to each OSD ("rados blacklist_and_spread 'address'"), and then requiring the automatic system to: a) Run the blacklist command, b) individually break the locks necessary, c) remount the image(s) elsewhere. Are there any thoughts on these plans? Do they satisfy your needs in this area, or are there holes you can think of? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html