Fixing a broken bucket index in RGW

Bryan Stillwell <bstillwell@xxxxxxxxxxx> · Wed, 16 Jan 2019 18:30:37 +0000

I'm looking for some help in fixing a bucket index on a Luminous (12.2.8)
cluster running on FileStore.

First some background on how I believe the bucket index became broken.  Last
month we had a PG in our .rgw.buckets.index pool become inconsistent:

2018-12-11 09:12:17.743983 osd.1879 osd.1879 10.36.173.147:6820/60041 16 : cluster [ERR] 7.8e : soid 7:717333b6:::.dir.default.1110451812.43.2:head omap_digest 0x59e4f686 != omap_digest 0x37b99ba6 from shard
 1879

We then attempted to repair the PG by using 'ceph pg repair 7.8e', but I
have a feeling the primary copy must have been corrupt (later that day I
learned about 'rados list-inconsistent-obj 7.8e -f json-pretty').  The
repair resulted in an unfound object:

2018-12-11 09:32:02.651241 osd.1753 osd.1753 10.32.12.32:6820/3455358 13 : cluster [ERR] 7.8e push 7:717333b6:::.dir.default.1110451812.43.2:head v 767605'30158112 failed because local copy is 767605'30158924

A couple hours later we started getting reports of 503s from multiple
customers.  Believing that the unfound object was the cause of the problem
we used the 'mark_unfound_lost revert' option to roll back to the previous
version:

ceph pg 7.8e mark_unfound_lost revert

This fixed the cluster, but broke the bucket.

Attempting to list the bucket contents results in:

[root@p3cephrgw007 ~]# radosgw-admin bucket list --bucket=backups.579
ERROR: store->list_objects(): (2) No such file or directory

This bucket appears to have been automatically sharded after we upgraded to
Luminous, so we do have an old bucket instance available (but it's too old
to be very helpful):

[root@p3cephrgw007 ~]# radosgw-admin metadata list bucket.instance |grep backups.579
    "backups.579:default.1110451812.43",
    "backups.579:default.28086735.566138",

Looking for for all the shards based on the name only pulls up the first 2
shards:

[root@p3cephrgw007 ~]# rados -p .rgw.buckets.index ls | grep "default.1110451812.43"
...
.dir.default.1110451812.43.0
...
.dir.default.1110451812.43.1
...

But the bucket metadata says there should be three:

[root@p3cephrgw007 ~]# radosgw-admin metadata get bucket.instance:backups.579:default.1110451812.43 | jq -r '.data.bucket_info.num_shards'
3

If we look in the log message above it said .dir.default.1110451812.43.2 was
the rados object that was slightly newer, so the revert command we ran must
have removed it instead of rolling it back to the previous version.

This leaves me with some questions:

What would have been the better way for dealing with this problem when the
whole cluster stopped working?

Is there a way to recreate the bucket index?  I see a couple options in the
docs for fixing the bucket index (--fix) and for rebuilding the bucket index
(--check-objects), but I don't see any explanations on how it goes about
doing that.  Will it attempt to scan all the objects in the cluster to
determine which ones belong in this bucket index?  Will the missing shard be
ignored and the fixed bucket index be missing 1/3rd of the objects?

Thanks,
Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com