[RFC] rgw: log zone changes via sync module

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Thu, 8 Mar 2018 16:51:42 -0800

This was discussed yesterday in the CDM. The need is for a log of all
changes that happen on a specific rgw zone (and potentially in
specific bucket[s]), so that this information can be used by external
tools. One such example would be a backup tool that would use these
logs to determine what objects changed and back everything new to a
separate system (for redundancy).

The original thought was to generate logs that are compatible with the
S3 bucket logging. The S3 logging works by enabling logs on a specific
bucket, and by specifying a specific bucket where the logs need to be
uploaded to. The logs are being written to an object that is generated
from time to time and keep a list of operations to the bucket. The
object nameThe logs themselves keep a list of operations that happen
on that specific bucket (which look pretty similar to access logs of a
web server). After examining these logs we weren't sure that the
specific logs format is really something that we should pursue. We can
still have a similar basic mechanism (logs that hold aggregated list
of changes, and are uploaded to an object in a bucket), but we can
drop the specific log format (were thinking of json encoding the
data).

The proposal is as follows:

We will create a sync module that would handle all objects
modification operations. The module will be configured with a list of
buckets and/or bucket prefixes for which we'd store info about newly
created or modified objects. The configuration will also include S3
endpoint, access keys, and a bucket name (or other path config) into
which the logs will be stored. The logs will be stored into a new
object that will be created periodically.

Implementation details:

Whenever a sync module write operation is handled, we will store
information about it in a temporary sharded omap index in the backing
rados store.

 - Temporary index

The index will keep information about all the recent objects changes
in the system. One question is whether we need to keep more than one
entry for a single object that is overwritten within the time window.
One option is to generate a new entry per each write, but this is not
going to be of much use (other than for auditing) as the overwritten
data is lost at that point. In the simple case where we create one
entry in the index per write we can just keep a single index (by
monotonically increasing timestamp + object name + object version). If
we only keep a single entry per object ( + object version), then we
need to keep two indexes: one index by object + version, a second
index by timestamp ( + object + version), so that we could remove the
old entry.
It should be possible to fetch keys + data out of these indexes for a
specific timeframe (e.g., starting at a specific timestamp, and ending
at a specific timestamp).

 - Collection thread

A collection thread will run periodically. It will take a lease over a
single control object that will guarantee that it is the only one that
does this work. It will iterate over the shards and take a lease on
them, and read all the info that was stored there up until a specific
timestamp. This data will be formatted (json) and sent to the backend
using the S3 api. If there is too much data we can flush it to the
backend periodically using multipart upload. Once the object is
created, the temporary index will be trimmed up to the specific
timestamp.

Any thoughts?

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html