full cluster/pool handling

Sage Weil <sweil@xxxxxxxxxx> · Thu, 24 Sep 2015 05:30:34 -0700 (PDT)

Xuan Liu recently pointed out that there is a problem with our handling 
for full clusters/pools: we don't allow any writes when full, 
including delete operations.

While fixing a separate full issue I ended up making several fixes and 
cleanups in the full handling code in 

	https://github.com/ceph/ceph/pull/6052

The interesting part of that is that we will allow a write as long as it 
doesn't increase the overall utilizate of bytes or objects (according to 
the pg stats we're maintaining).  That will include remove ops, of cours, 
but will also allow overwrites while full, which seems fair.

However, that's not quite the full story: the client side currently 
does not send any requests while the full flag is set--it waits until the 
full flags are cleared before resending things.

We can modify things on the client so that it allows ops it knows will 
succeed (e.g., a simple remove op).  However, if there is another op also 
queued on that object *before* it, we should either block the remove op 
(to preserve ordering) or discard it when the remove succeeds (on the 
assumption that any effect it had is now moot).

Is the latter option safe?

Or, should we do something more clever?  Ideally it would be good if other 
allowed operations are let through, but unfortunately the client doesn't 
really know enough to tell whether it will/can succeed.  e.g., a class 
"refcount.put" call might result in a deletion (and in fact there is a 
class that does just that).  We could also send all such requests and, if 
we get ENOSPC, keep them queued and retry when the full flag is cleared.  
That would require a bit more complexity on the OSD side to preserve 
ordering, but it's doable...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html