How about handling the DELETE op in the cache tier like this: 1) If the object is in the cache tier, we delete it in cache tier, replace it with a whiteout, and later flush and evict it. 2) If the object is not in the cache tier, we always proxy the delete op. This can be done after the proxy write code is merged into master. BTW, for the skipping promotion, I proposed a PR to add a 'SKIP_PROMOTE' flag in the OpRequest, like we did for the 'FORCE_PROMOTE'. This can avoid the extra checks when handling the op. The PR is at https://github.com/ceph/ceph/pull/3975 -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil Sent: Friday, March 27, 2015 9:51 PM To: Ning Yao Cc: ceph-devel Subject: Re: RBD Discard issue for Cache_tier On Fri, 27 Mar 2015, Ning Yao wrote: > Hi all, > > I use the kernel rbd with kernel 3.18 and open the discard option. > When I use the cache tier mode, the performance is ruined by > CEPH_OSD_OP_DELETE. > > Since some one may delete a large file which is rarely used, the file > is always not in the Cache pool. So it will promote the object first > from the cold pool and then replace the object with an empty object. > After the empty object is flushed and evicted, the content is > eventually deleted. > > But a large file causes lots of object promotion so that the Cache > pool's bandwidth is saturated. We might not need to promote a delete > the object when Calling can_skip_promote() and send a > CEPH_OSD_OP_DELETE op to cold pool from the Objecter interface, which > would be much better when deleting file occurs. Is that possible? Yes. The trick right now is that the DELETE op is defined to return ENOENT if the object doesn't exist, and the code isn't smart enough to skip the promotion. I think there are two options: 1) Special case deletion code in the promotion code that skips most of the work. Unfortunately I think this will be fragile and annoying to maintain. 2) Set a flag on the client op indicating that it can ignore the delete 'failure' and skip promotion. There is already a hook for this (can_skip_promote) in ReplicatedPG, although it's not quite right: the 'FAILOK' flag means that we should proceed with the operation, but the per-op return code is still supposed to be -EINVAL to the client and we don't do that. I think we actually want an 'idempotent' flag/arg for delete itself. There's plenty of room in the ceph_osd_op args to add this and it should be easy to do in a backwards compatible way.. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html