I have two lines of thought: 1. I intuit that more interaction with concrete workflows--e.g., from a mature backup product in a multi-tenant deployment would be helpful to firm up requirements 1.1. it appears to me that we may be rushing to a specific design without a lot of input from applications 2. I would like to consider approaches which do not rely on indexed storage (i.e., omap); the overhead from indexed updates is currently a disproportionate share of RGW workload cost, and I'd love to reduce (and avoid increasing) it Matt On Thu, Mar 8, 2018 at 7:51 PM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote: > This was discussed yesterday in the CDM. The need is for a log of all > changes that happen on a specific rgw zone (and potentially in > specific bucket[s]), so that this information can be used by external > tools. One such example would be a backup tool that would use these > logs to determine what objects changed and back everything new to a > separate system (for redundancy). > > The original thought was to generate logs that are compatible with the > S3 bucket logging. The S3 logging works by enabling logs on a specific > bucket, and by specifying a specific bucket where the logs need to be > uploaded to. The logs are being written to an object that is generated > from time to time and keep a list of operations to the bucket. The > object nameThe logs themselves keep a list of operations that happen > on that specific bucket (which look pretty similar to access logs of a > web server). After examining these logs we weren't sure that the > specific logs format is really something that we should pursue. We can > still have a similar basic mechanism (logs that hold aggregated list > of changes, and are uploaded to an object in a bucket), but we can > drop the specific log format (were thinking of json encoding the > data). > > The proposal is as follows: > > We will create a sync module that would handle all objects > modification operations. The module will be configured with a list of > buckets and/or bucket prefixes for which we'd store info about newly > created or modified objects. The configuration will also include S3 > endpoint, access keys, and a bucket name (or other path config) into > which the logs will be stored. The logs will be stored into a new > object that will be created periodically. > > Implementation details: > > Whenever a sync module write operation is handled, we will store > information about it in a temporary sharded omap index in the backing > rados store. > > - Temporary index > > The index will keep information about all the recent objects changes > in the system. One question is whether we need to keep more than one > entry for a single object that is overwritten within the time window. > One option is to generate a new entry per each write, but this is not > going to be of much use (other than for auditing) as the overwritten > data is lost at that point. In the simple case where we create one > entry in the index per write we can just keep a single index (by > monotonically increasing timestamp + object name + object version). If > we only keep a single entry per object ( + object version), then we > need to keep two indexes: one index by object + version, a second > index by timestamp ( + object + version), so that we could remove the > old entry. > It should be possible to fetch keys + data out of these indexes for a > specific timeframe (e.g., starting at a specific timestamp, and ending > at a specific timestamp). > > - Collection thread > > A collection thread will run periodically. It will take a lease over a > single control object that will guarantee that it is the only one that > does this work. It will iterate over the shards and take a lease on > them, and read all the info that was stored there up until a specific > timestamp. This data will be formatted (json) and sent to the backend > using the S3 api. If there is too much data we can flush it to the > backend periodically using multipart upload. Once the object is > created, the temporary index will be trimmed up to the specific > timestamp. > > > Any thoughts? > > Yehuda -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html