Re: Smarter blacklisting?

Jason Dillaman <jdillama@xxxxxxxxxx> · Thu, 20 Apr 2017 09:09:20 -0400

We didn't fully flesh it out, but I believe we were talking about
iSCSI persistent group reservations and how we could pseudo-blacklist
a (stale) client after the reservation has been updated. The current
approach to handle PGRs requires two round-trips for each IO to (1)
verify the current PGR state and (2) issue the IO if the reservation
is still active. Not only will this be slow due to the extra
round-trip -- it's taking advantage of some ambiguity in the spec with
regard to how in-flight IOs would react given a PGR update.

In order to eliminate this extra round-trip, we need some way for the
OSDs to generically enforce the PGR on each IO. The idea was that we
could have a new "user-managed epoch" sequence that would be managed
by the monitors and enforced by the OSDs. Given such a system, if I
have a target with an in-flight IO that is associated with epoch X
(e.g. via a new op at the start of the transaction) and the end-user
application issuing IO has a failover event which results in a PGR
update, the target path could request epoch X + 1 and know that from
that point on any old, in-flight IO associated with the older sequence
number could not be committed to disk.

The problem is that you would somehow need to ensure that all the
primary PGs know about the new epoch, hopefully without the need to
have librbd "ping" each of the primary PGs that could contain the
backing objects to an image. Perhaps this op could implement a small
lease window on the epoch to ensure the OSDs never have a epoch
sequence that is over X seconds old.

Jason

On Wed, Apr 19, 2017 at 7:26 PM, Josh Durgin <jdurgin@xxxxxxxxxx> wrote:
> On 04/19/2017 06:16 AM, Sage Weil wrote:
>>
>> On Wed, 19 Apr 2017, John Spray wrote:
>>>
>>> I was already pondering where the more detailed blacklist info (e.g.
>>> ids of clients) should go, as it's not something that actually needs
>>> to be shared with all the normal OSDMap subscribers (it's only the
>>> entity that does the blacklist removal that needs to see that).  It's
>>> already not ideal imho that we expose the list of all blacklisted
>>> clients to all the other clients -- in general they shouldn't be able
>>> to e.g. learn one another's addresses like this.
>>
>>
>> We already break the OSDMap encoding into two sections: the first part
>> that clients care about, and the second part that is only used by OSDs.
>> (The kernel client doesn't bother to decode the second half.)
>>
>> I suspect it wouldn't take much to send abbreviated maps (and
>> incrementals) to clients that don't include that second half at all...
>
>
> I was talking to Jason and Mike about another use of more generalized
> blacklisting at Vault, but I can't remember the details - do you guys?
>
> IIRC it sounded like that use would make sense as a separate map that
> osds and clients would subscribe to.
>
> Josh

-- 
Jason
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html