rgwscrub

"Varada Kari (System Engineer)" <varadaraja.kari@xxxxxxxxxxxx> · Fri, 30 Mar 2018 07:48:19 +0530

Hi,

we had this discussion last week at cephalacon on the consistency of
the rgw objects. This is just a thought, not designed well, could be
better designed and implemented.

Right now, we don't have any tool which check the consistency of the
data in a bucket for RGW level.

Ceph osd scrub doesn't catch any errors related to rgw except disk
errors and osd related  issues etc… Rados/Osd doesn't have the context
 about the data written from RGW  at logical level. It treats data as
a binary stream of rados objects and does all the validations at a
that level.

Some silent corruptions at the rados level, might be due to pg
incompletes or some disc corruptions(if it happens at primary, scrub
overwrites all replicas with wrong data), reading the object can fail
or can serve wrong data.

And as a result of Incomplete PG, happened due to cache tier osd node
coming back after a long time. we are observed some objects with
missing chunks, except the header chunk. We had to use some offline
tools to find out the damaged objects on the cluster to identify the
missing objects from RGW.

This lead us to the need for a scrub like tool at RGW.

One proposal is to implement on demand scrub kind of mechanism on the
bucket or per user. If user is specified, scrub will scan all the
buckets of user and reports any problems ever so encountered. This can
be part of radosgw-admin command. we can have some optimizations on
this.

Second implementation can be, adding scrub as a periodic activity in
all the RGWs. We can have some policies around a bucket or a user to
scrub all the buckets once in 3-6 months. We can scan some percentage
of buckets/users every time scrub is invoked, because of the resources
needed to run scrub.

This engine can be designed separately to take care of user policies
on when to schedule and what type scrub to run etc…

Scrub can be done in two ways like osd scrub, scrub(shallow ) and a deep one.

A non deep scrub a.k.a shallow scrub, will not read the data, instead
does a rados stat on the object check the existence and can compare
few parameters to check the consistency. It will stat all the
chunks/multipart chunks, going through the manifest,  if any different
sized object compared to the manifest is encountered, it will be
reported. PG deep scrub at each PG guarantees the consistency among
the replicas, so checking all the replica version is not a requirement
here.

Second version is, Deep scrub, it  will read the data, compare the
md5sum with the Etag and make sure the contents are intact.  Data is
not read from all the replicas, but the functionality is delegated to
PG deep scrub.

We can have different policies to have these tasks run like osd scrub,
shallow scrubs can be run on all the buckets of user once in a month
and deep scrub can be schedule once in 3 months or some percentage of
buckets in 3 months.

Please let us know your thoughts.

Current state, we have few offline tools written in python to scrub
all the buckets and check the consistency. Will be sending them soon.

Thanks,
Varada.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html