Re: Not all Bucket Shards being used

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Fri, 28 Jul 2023 14:30:35 -0400

Thank you for the information, Christian. When you reshard the bucket id is updated (with most recent versions of ceph, a generation number is incremented). The first bucket id matches the bucket marker, but after the first reshard they diverge.

The bucket id is in the names of the currently used bucket index shards. You’re searching for the marker, which means you’re finding older bucket index shards.

Change your commands to these:

# rados -p raum.rgw.buckets.index ls \
   |grep 3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1 \
   |sort -V

# rados -p raum.rgw.buckets.index ls \
   |grep 3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1 \
   |sort -V \
   |xargs -IOMAP sh -c \
       'rados -p raum.rgw.buckets.index listomapkeys OMAP | wc -l'

When you refer to the “second zone”, what do you mean? Is this cluster using multisite? If and only if your answer is “no”, then it’s safe to remove old bucket index shards. Depending on the version of ceph running when reshard was run, they were either intentionally left behind (earlier behavior) or removed automatically (later behavior).

Eric
(he/him)

> On Jul 25, 2023, at 6:32 AM, Christian Kugler <syphdias+ceph@xxxxxxxxx> wrote:
> 
> Hi Eric,
> 
>> 1. I recommend that you *not* issue another bucket reshard until you figure out what’s going on.
> 
> Thanks, noted!
> 
>> 2. Which version of Ceph are you using?
> 17.2.5
> I wanted to get the Cluster to Health OK before upgrading. I didn't
> see anything that led me to believe that an upgrade could fix the
> reshard issue.
> 
>> 3. Can you issue a `radosgw-admin metadata get bucket:<bucket-name>` so we can verify what the current marker is?
> 
> # radosgw-admin metadata get bucket:sql20
> {
>    "key": "bucket:sql20",
>    "ver": {
>        "tag": "_hGhtgzjcWY9rO9JP7YlWzt8",
>        "ver": 3
>    },
>    "mtime": "2023-07-12T15:56:55.226784Z",
>    "data": {
>        "bucket": {
>            "name": "sql20",
>            "marker": "3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9",
>            "bucket_id": "3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1",
>            "tenant": "",
>            "explicit_placement": {
>                "data_pool": "",
>                "data_extra_pool": "",
>                "index_pool": ""
>            }
>        },
>        "owner": "S3user",
>        "creation_time": "2023-04-26T09:22:01.681646Z",
>        "linked": "true",
>        "has_bucket_info": "false"
>    }
> }
> 
>> 4. After you resharded previously, did you get command-line output along the lines of:
>> 2023-07-24T13:33:50.867-0400 7f10359f2a80 1 execute INFO: reshard of bucket “<bucket-name>" completed successfully
> 
> I think so, at least for the second reshard. But I wouldn't bet my
> life on it. I fear I might have missed an error on the first one since
> I have done a radosgw-admin bucket reshard so often and never seen it
> fail.
> 
> Christian
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx