Re: [RFC] rgw: log zone changes via sync module

Matt Benjamin <mbenjami@xxxxxxxxxx> · Thu, 8 Mar 2018 20:03:28 -0500

I have two lines of thought:

1. I intuit that more interaction with concrete workflows--e.g., from
a mature backup product in a multi-tenant deployment would be helpful
to firm up requirements
1.1. it appears to me that we may be rushing to a specific design
without a lot of input from applications

2. I would like to consider approaches which do not rely on indexed
storage (i.e., omap);  the overhead from indexed updates is currently
a disproportionate share of RGW workload cost, and I'd love to reduce
(and avoid increasing) it

Matt

On Thu, Mar 8, 2018 at 7:51 PM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:
> This was discussed yesterday in the CDM. The need is for a log of all
> changes that happen on a specific rgw zone (and potentially in
> specific bucket[s]), so that this information can be used by external
> tools. One such example would be a backup tool that would use these
> logs to determine what objects changed and back everything new to a
> separate system (for redundancy).
>
> The original thought was to generate logs that are compatible with the
> S3 bucket logging. The S3 logging works by enabling logs on a specific
> bucket, and by specifying a specific bucket where the logs need to be
> uploaded to. The logs are being written to an object that is generated
> from time to time and keep a list of operations to the bucket. The
> object nameThe logs themselves keep a list of operations that happen
> on that specific bucket (which look pretty similar to access logs of a
> web server). After examining these logs we weren't sure that the
> specific logs format is really something that we should pursue. We can
> still have a similar basic mechanism (logs that hold aggregated list
> of changes, and are uploaded to an object in a bucket), but we can
> drop the specific log format (were thinking of json encoding the
> data).
>
> The proposal is as follows:
>
> We will create a sync module that would handle all objects
> modification operations. The module will be configured with a list of
> buckets and/or bucket prefixes for which we'd store info about newly
> created or modified objects. The configuration will also include S3
> endpoint, access keys, and a bucket name (or other path config) into
> which the logs will be stored. The logs will be stored into a new
> object that will be created periodically.
>
> Implementation details:
>
> Whenever a sync module write operation is handled, we will store
> information about it in a temporary sharded omap index in the backing
> rados store.
>
>  - Temporary index
>
> The index will keep information about all the recent objects changes
> in the system. One question is whether we need to keep more than one
> entry for a single object that is overwritten within the time window.
> One option is to generate a new entry per each write, but this is not
> going to be of much use (other than for auditing) as the overwritten
> data is lost at that point. In the simple case where we create one
> entry in the index per write we can just keep a single index (by
> monotonically increasing timestamp + object name + object version). If
> we only keep a single entry per object ( + object version), then we
> need to keep two indexes: one index by object + version, a second
> index by timestamp ( + object + version), so that we could remove the
> old entry.
> It should be possible to fetch keys + data out of these indexes for a
> specific timeframe (e.g., starting at a specific timestamp, and ending
> at a specific timestamp).
>
>  - Collection thread
>
> A collection thread will run periodically. It will take a lease over a
> single control object that will guarantee that it is the only one that
> does this work. It will iterate over the shards and take a lease on
> them, and read all the info that was stored there up until a specific
> timestamp. This data will be formatted (json) and sent to the backend
> using the S3 api. If there is too much data we can flush it to the
> backend periodically using multipart upload. Once the object is
> created, the temporary index will be trimmed up to the specific
> timestamp.
>
>
> Any thoughts?
>
> Yehuda

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html