We've discussed this feature briefly in the past, and it might be time to look at the design a bit. The S3 and Swift features differ quite a bit, so let's have a look at both: S3: Object expiration is part of a larger bucket lifecycle management feature. This allows setting rules on a bucket that specify what to do with specific objects (that have a specified prefix), after an amount of time. Objects can either be removed, or transferred into a secondary storage. The objects can either be current and expire, or (in the case of versioned buckets) can be non-current. Bucket lifecycle rules can be added, and removed, and they affect *all* objects in the bucket, including objects created before the rules were created. An interesting property is that users are not billed for expired objects, even if the (async) removal process has not removed them yet. Swift: The Swift objects expiration is set at the object level. It is possible to set a specific header that will set expiration time for the object. An async process will then garbage collect the object. An expired object cannot be read anymore (although it is possible that it can be listed, and removed by the user). Looking at both features, it is possible to define a superset. That is, provide both the S3 bucket-level lifecycle management, and the swift object-level expiration scheme. rgw implementation: Object level expiration, a'la Swift: - A new maintenance thread, similar to the garbage collector will be created. The thread will be used to apply deferred operations. - A new maintenance log will be created. The log will be sharded, and entries there will be indexed by both timestamp, and maintenance thread will work as follows: try to lock a shard, read shard, operate, unlock - an object could be assigned with an expiration timestamp When an object is set to expire, we'll update the maintenance log with its id, and the timestamp. Note that we'll also keep note of the object instance's tag, so that if the object is overwritten, we won't remove the new instance. When updating the maintenance log, we'll remove any existing entry for the same object. - when reading an object, we'll check to see if it's expired so that we return a proper response - maintenance log will read entries, up until current timestamp, and issue object removal for each of these entries The S3 object expiration is much more complicated. It will still use the same maintenance thread. Now, we'll need to decide whether we want to provide a strong accounting functionality similar to S3 (objects are not accounted if need to expire, even if were not garbage collected yet), as it will affect the implementation. Relaxed accounting: - Bucket rules list will be versioned. Each rule change will bump up this version. Each rule will have the version in which it was created. - When adding a rule on a bucket, create a maintenance job that will add relevant objects in this bucket to the list, and the rule (and version) it applies to - When removing objects that apply to a specific rule, the maintenance thread will verify that this rule+version is still active - Adding an object within a bucket, will add an appropriate entry in the maintenance log, if applicable Strict accounting: - Do we really want this? - bucket index will need to add accounting adjustments (by timestmap) - an object that is set to expire, will be added to the adjustments record (by the timestamp). When the object is removed, it'll be deducted from that record - when getting bucket's stats, we'll also get the adjustment accounting (up until the relevant timestamp) - open question: how to update the quota Let me know if this makes any sense. Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html