idempotent op (esp delete)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Background:

1) Way back when we made a task that would thrash the cache modes by 
adding and removing the cache tier while ceph_test_rados was running.  
This mostly worked, but would occasionally fail because we would

 - delete an object from the cache tier
 - a network failure injection would lose the reply
 - we'd disable the cache
 - the delete would resend to the base tier, not get recognized as a dup 
(different pool, different pg log)
   -> -ENOENT instead of 0

2) The proxy write code hits a similar problem:

 - delete gets proxied
 - we initiate async promote
 - a network failure injection loses the delete reply
 - delete resends and blocks on promote (or arrives after it finishes)
 - promote finishes
 - delete is handled
  -> -ENOENT instead of 0

The ticket is http://tracker.ceph.com/issues/8935

The problem is partially addressed by

	https://github.com/ceph/ceph/pull/3447

by logging a few request ids on every object_info_t and preserving that on 
promote and flush.

However, it doesn't solve the problem for delete because we 
throw out object_info_t so that reqid_t is lost.

I think we have two options, not necessarily mutually exclusive:

1) When promoting an object that doesn't exist (to create a whiteout), 
pull reqids out of the base tier's pg log so that the whiteout is primed 
with request ids.

1.5) When flushing... well, that is harder because we have nowhere to put 
the reqids.  Unless we make a way to cram a list of reqid's into a single 
PG log entry...?  In that case, we wouldn't strictly need the per-object 
list since we could pile the base tier's reqids into the promote log entry 
in the cache tier.

2) Make delete idempotent (0 instead of ENOENT if the object doesn't 
exist).  This will require a delicate compat transition (let's ignore that 
a moment) but you can preserve the old behavior for callers that care by 
preceding the delete with an assert_exists op.  Most callers don't care, 
but a handful do.  This simplifies the semantics we need to support going 
forward.

Of course, it's all a bit delicate.  The idempotent op semantics have a 
time horizon so it's all a bit wishy-washy... :/

Thoughts?
sage


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux