Re: rgw index shard much larger than others

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Tue, 8 Dec 2020 14:48:45 +0100

Hi Eric & Matt,

I'm working on this again, and was able to reproduce with a versioned
test bucket in v14.2.11. I put a test file "passwd", then deleted it,
then let the lc trim the versions. The exact lc and resulting bi list
are at: https://stikked.web.cern.ch/stikked/view/raw/cc748686

> an automated clean-up is non-trivial but feasible; it would have to take into account that an object with the same name as the previously deleted one was re-created in the versioned bucket

I've tried various things to remove this nameless entry, but didn't succeed.
 (tried adding / removing the same named object, with and without
versioning, with and without the lc enabled. in all cases the unnamed
entry remains.)

Do you have any suggestions on how to remove that entry? Maybe I need
to remove the omap key directly (which will be an interesting
challenge, given that the key starts with 0x80).

Also, it occurred to me that even if I'm able to clean up these
entries, and even with the fix for
https://tracker.ceph.com/issues/46456, we'll still have the problem
that "when the final instance of an object in a versioned bucket is
deleted, but for reasons we do not yet understand, the object was not
fully deleted from the bucket index". So we'll accumulate these zombie
entries, even though they'll now be reshardable.

In other words, I was thinking of resolving our issue by simply
rcloning from our affected bucket to a new bucket, but the 2nd bug
would leave us with a large number of useless index entries.

Should we open a tracker either of these things? (removing unnamed
entries, removing the last index entry of an object)

Best Regards,

Dan

On Fri, Oct 2, 2020 at 10:02 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> Hi Eric,
>
> So yes we're hit by this. We have around 1.6M entries in shard 0 with
> an empty key, e.g.:
>
>     {
>         "type": "olh",
>         "idx": "<80>1001_02/5f/025f8e0fc8234530d6ae7302adf682509f0f7fb68666391122e16d00bd7107e3/2018_11_14/2625203/3034777/metadata.gz",
>         "entry": {
>             "key": {
>                 "name": "",
>                 "instance": ""
>             },
>             "delete_marker": "false",
>             "epoch": 11,
>             "pending_log": [],
>             "tag": "uhzz6da13ovbr69hhlttdjqmwic4f2v8",
>             "exists": "false",
>             "pending_removal": "true"
>         }
>     },
>
> exists is false and pending_removal is true for all of them.
>
> Cheers, Dan
>
> On Thu, Oct 1, 2020 at 11:32 PM Eric Ivancich <ivancich@xxxxxxxxxx> wrote:
> >
> > Hi Dan,
> >
> > One way to tell would be to do a:
> >
> > radosgw-admin bi list —bucket=<bucket>
> >
> > And see if any of the lines output contains (perhaps using `grep`):
> >
> > "type": "olh",
> >
> > That would tell you if there were any versioned objects in the bucket.
> >
> > The “fix” we currently have only prevents this from happening in the future. We currently do not have a “fix” that cleans up the bucket index. Like I mentioned — an automated clean-up is non-trivial but feasible; it would have to take into account that an object with the same name as the previously deleted one was re-created in the versioned bucket.
> >
> > I hope that’s informative, if not what you were hoping to hear.
> >
> > Eric
> > --
> > J. Eric Ivancich
> >
> > he / him / his
> > Red Hat Storage
> > Ann Arbor, Michigan, USA
> >
> > On Oct 1, 2020, at 10:53 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > Thanks Matt and Eric,
> >
> > Sorry for the basic question, but how can I as a ceph operator tell if
> > a bucket is versioned?
> >
> > And for fixing this current situation, I would wait for the fix then reshard?
> > (We want to reshard this bucket anyway because listing perf is way too
> > slow for the user with 512 shards).
> >
> > -- Dan
> >
> >
> > On Thu, Oct 1, 2020 at 4:36 PM Eric Ivancich <ivancich@xxxxxxxxxx> wrote:
> >
> >
> > Hi Matt and Dan,
> >
> > I too suspect it’s the issue Matt linked to. That bug only affects versioned buckets, so I’m guessing your bucket is versioned, Dan.
> >
> > This bug is triggered when the final instance of an object in a versioned bucket is deleted, but for reasons we do not yet understand, the object was not fully deleted from the bucket index. And then a reshard moves part of the object index to shard 0.
> >
> > Upgrading to a version that included Casey’s fix would mean this situation is not re-created in the future.
> >
> > An automated clean-up is non-trivial but feasible. It would have to take into account that an object with the same name as the previously deleted one was re-created in the versioned bucket.
> >
> > Eric
> >
> > On Oct 1, 2020, at 8:46 AM, Matt Benjamin <mbenjami@xxxxxxxxxx> wrote:
> >
> > Hi Dan,
> >
> > Possibly you're reproducing https://tracker.ceph.com/issues/46456.
> >
> > That explains how the underlying issue worked, I don't remember how a
> > bucked exhibiting this is repaired.
> >
> > Eric?
> >
> > Matt
> >
> >
> > On Thu, Oct 1, 2020 at 8:41 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> >
> > Dear friends,
> >
> > Running 14.2.11, we have one particularly large bucket with a very
> > strange distribution of objects among the shards. The bucket has 512
> > shards, and most shards have ~75k entries, but shard 0 has 1.75M
> > entries:
> >
> > # rados -p default.rgw.buckets.index listomapkeys
> > .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.0 | wc -l
> > 1752085
> >
> > # rados -p default.rgw.buckets.index listomapkeys
> > .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.1 | wc -l
> > 78388
> >
> > # rados -p default.rgw.buckets.index listomapkeys
> > .dir.61c59385-085d-4caa-9070-63a3868dccb6.272652427.1.2 | wc -l
> > 78764
> >
> > We had resharded this bucket (manually) from 32 up to 512 shards just
> > before upgrading from 12.2.12 to 14.2.11 a couple weeks ago.
> >
> > Any idea why shard .0 is getting such an imbalance of entries?
> > Should we manually reshard this bucket again?
> >
> > Thanks!
> >
> > Dan
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> >
> >
> > --
> >
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> >
> > http://www.redhat.com/en/technologies/storage
> >
> > tel.  734-821-5101
> > fax.  734-769-8938
> > cel.  734-216-5309
> >
> >
> >
> >
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx