Smarter blacklisting?

John Spray <jspray@xxxxxxxxxx> · Tue, 18 Apr 2017 17:37:11 +0100

Currently, when we add an address to the blacklist, we leave it in
there for a set period of time (24 minutes by default, which I suspect
might have been meant to be 24 hours), and then expire it.

Clearly there are two problems with that:
 * We leave things in the list for much longer than necessary most of
the time, when a blacklisted client/node comes back reasonably soon
after a restart
 * We are never 100% guaranteed that a long-halted client won't come
back after its blacklist entry has expired (e.g. a paused VM with
dirty pages, wakes up a day later and writes back to OSDs).

These mostly haven't been too much trouble in practice, but we may be
(optionally) doing a lot more blacklisting on cephfs systems soon[1],
and cephfs clients are perhaps more likely to be VMs than RBD hosts.

One thought is to have an alternative type of backlist entry that does
not have an expiration, but instead is automatically removed when we
see a client authenticate with the same auth id, from the same IP
address as the blacklist entry, but with a different nonce.

Flushing out any blacklist entries from a host that never came back
would be an administrative operation, or we could do it automatically
on a *super* long expiration time (like a month), and in other cases
like if the auth identity associated with the blacklist entry was
removed.

Any thoughts?

John

1. https://github.com/ceph/ceph/pull/14610
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html