We didn't fully flesh it out, but I believe we were talking about iSCSI persistent group reservations and how we could pseudo-blacklist a (stale) client after the reservation has been updated. The current approach to handle PGRs requires two round-trips for each IO to (1) verify the current PGR state and (2) issue the IO if the reservation is still active. Not only will this be slow due to the extra round-trip -- it's taking advantage of some ambiguity in the spec with regard to how in-flight IOs would react given a PGR update. In order to eliminate this extra round-trip, we need some way for the OSDs to generically enforce the PGR on each IO. The idea was that we could have a new "user-managed epoch" sequence that would be managed by the monitors and enforced by the OSDs. Given such a system, if I have a target with an in-flight IO that is associated with epoch X (e.g. via a new op at the start of the transaction) and the end-user application issuing IO has a failover event which results in a PGR update, the target path could request epoch X + 1 and know that from that point on any old, in-flight IO associated with the older sequence number could not be committed to disk. The problem is that you would somehow need to ensure that all the primary PGs know about the new epoch, hopefully without the need to have librbd "ping" each of the primary PGs that could contain the backing objects to an image. Perhaps this op could implement a small lease window on the epoch to ensure the OSDs never have a epoch sequence that is over X seconds old. Jason On Wed, Apr 19, 2017 at 7:26 PM, Josh Durgin <jdurgin@xxxxxxxxxx> wrote: > On 04/19/2017 06:16 AM, Sage Weil wrote: >> >> On Wed, 19 Apr 2017, John Spray wrote: >>> >>> I was already pondering where the more detailed blacklist info (e.g. >>> ids of clients) should go, as it's not something that actually needs >>> to be shared with all the normal OSDMap subscribers (it's only the >>> entity that does the blacklist removal that needs to see that). It's >>> already not ideal imho that we expose the list of all blacklisted >>> clients to all the other clients -- in general they shouldn't be able >>> to e.g. learn one another's addresses like this. >> >> >> We already break the OSDMap encoding into two sections: the first part >> that clients care about, and the second part that is only used by OSDs. >> (The kernel client doesn't bother to decode the second half.) >> >> I suspect it wouldn't take much to send abbreviated maps (and >> incrementals) to clients that don't include that second half at all... > > > I was talking to Jason and Mike about another use of more generalized > blacklisting at Vault, but I can't remember the details - do you guys? > > IIRC it sounded like that use would make sense as a separate map that > osds and clients would subscribe to. > > Josh -- Jason -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html