This was discussed yesterday in the CDM. The need is for a log of all changes that happen on a specific rgw zone (and potentially in specific bucket[s]), so that this information can be used by external tools. One such example would be a backup tool that would use these logs to determine what objects changed and back everything new to a separate system (for redundancy). The original thought was to generate logs that are compatible with the S3 bucket logging. The S3 logging works by enabling logs on a specific bucket, and by specifying a specific bucket where the logs need to be uploaded to. The logs are being written to an object that is generated from time to time and keep a list of operations to the bucket. The object nameThe logs themselves keep a list of operations that happen on that specific bucket (which look pretty similar to access logs of a web server). After examining these logs we weren't sure that the specific logs format is really something that we should pursue. We can still have a similar basic mechanism (logs that hold aggregated list of changes, and are uploaded to an object in a bucket), but we can drop the specific log format (were thinking of json encoding the data). The proposal is as follows: We will create a sync module that would handle all objects modification operations. The module will be configured with a list of buckets and/or bucket prefixes for which we'd store info about newly created or modified objects. The configuration will also include S3 endpoint, access keys, and a bucket name (or other path config) into which the logs will be stored. The logs will be stored into a new object that will be created periodically. Implementation details: Whenever a sync module write operation is handled, we will store information about it in a temporary sharded omap index in the backing rados store. - Temporary index The index will keep information about all the recent objects changes in the system. One question is whether we need to keep more than one entry for a single object that is overwritten within the time window. One option is to generate a new entry per each write, but this is not going to be of much use (other than for auditing) as the overwritten data is lost at that point. In the simple case where we create one entry in the index per write we can just keep a single index (by monotonically increasing timestamp + object name + object version). If we only keep a single entry per object ( + object version), then we need to keep two indexes: one index by object + version, a second index by timestamp ( + object + version), so that we could remove the old entry. It should be possible to fetch keys + data out of these indexes for a specific timeframe (e.g., starting at a specific timestamp, and ending at a specific timestamp). - Collection thread A collection thread will run periodically. It will take a lease over a single control object that will guarantee that it is the only one that does this work. It will iterate over the shards and take a lease on them, and read all the info that was stored there up until a specific timestamp. This data will be formatted (json) and sent to the backend using the S3 api. If there is too much data we can flush it to the backend periodically using multipart upload. Once the object is created, the temporary index will be trimmed up to the specific timestamp. Any thoughts? Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html