Re: failure resharding radosgw bucket

Jan Horstmann <J.Horstmann@xxxxxxxxxxx> · Thu, 24 Nov 2022 17:38:15 +0000

On Wed, 2022-11-23 at 12:57 -0500, Casey Bodley wrote:
> hi Jan,
> 
> On Wed, Nov 23, 2022 at 12:45 PM Jan Horstmann <J.Horstmann@xxxxxxxxxxx> wrote:
> > 
> > Hi list,
> > I am completely lost trying to reshard a radosgw bucket which fails
> > with the error:
> > 
> > process_single_logshard: Error during resharding bucket
> > 68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs:(2) No such file or
> > directory
> > 
> > But let me start from the beginning. We are running a ceph cluster
> > version 15.2.17. Recently we received a health warning because of
> > "large omap objects". So I grepped through the logs to get more
> > information about the object and then mapped that to a radosgw bucket
> > instance ([1]).
> > I believe this should normally be handled by dynamic resharding of the
> > bucket, which has already been done 23 times for this bucket ([2]).
> > For recent resharding tries the radosgw is logging the error mentioned
> > at the beginning. I tried to reshard manually by following the process
> > in [3], but that consequently leads to the same error.
> > When running the reshard with debug options ( --debug-rgw=20 --debug-
> > ms=1) I can get some additional insight on where exactly the failure
> > occurs:
> > 
> > 2022-11-23T10:41:20.754+0000 7f58cf9d2080  1 --
> > 10.38.128.3:0/1221656497 -->
> > [v2:10.38.128.6:6880/44286,v1:10.38.128.6:6881/44286] --
> > osd_op(unknown.0.0:46 5.6 5:66924383:reshard::reshard.0000000005:head
> > [call rgw.reshard_get in=149b] snapc 0=[]
> > ondisk+read+known_if_redirected e44374) v8 -- 0x56092dd46a10 con
> > 0x56092dcfd7a0
> > 2022-11-23T10:41:20.754+0000 7f58bb889700  1 --
> > 10.38.128.3:0/1221656497 <== osd.210 v2:10.38.128.6:6880/44286 4 ====
> > osd_op_reply(46 reshard.0000000005 [call] v0'0 uv1180019 ondisk = -2
> > ((2) No such file or directory)) v8 ==== 162+0+0 (crc 0 0 0)
> > 0x7f58b00dc020 con 0x56092dcfd7a0
> > 
> > 
> > I am not sure how to interpret this and how to debug this any further.
> > Of course I can provide the full output if that helps.
> > 
> > Thanks and regards,
> > Jan
> > 
> > [1]
> > root@ceph-mon1:~# grep -r 'Large omap object found. Object'
> > /var/log/ceph/ceph.log
> > 2022-11-15T14:47:28.900679+0000 osd.47 (osd.47) 10890 : cluster [WRN]
> > Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> > 336457 Size (bytes): 117560231
> > 2022-11-17T04:51:43.593811+0000 osd.50 (osd.50) 90 : cluster [WRN]
> > Large omap object found. Object: 3:0de49b75:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.10:head PG: 3.aed927b0 (3.30) Key count:
> > 205346 Size (bytes): 71669614
> > 2022-11-18T02:55:07.182419+0000 osd.47 (osd.47) 10917 : cluster [WRN]
> > Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> > 449776 Size (bytes): 157310435
> > 2022-11-19T09:56:47.630679+0000 osd.29 (osd.29) 114 : cluster [WRN]
> > Large omap object found. Object: 3:61ad76c5:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.12:head PG: 3.a36eb586 (3.6) Key count:
> > 213843 Size (bytes): 74703544
> > 2022-11-20T13:04:39.979349+0000 osd.72 (osd.72) 83 : cluster [WRN]
> > Large omap object found. Object: 3:2b3227e7:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.22:head PG: 3.e7e44cd4 (3.14) Key count:
> > 326676 Size (bytes): 114453145
> > 2022-11-21T02:53:32.410698+0000 osd.50 (osd.50) 151 : cluster [WRN]
> > Large omap object found. Object: 3:0de49b75:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.10:head PG: 3.aed927b0 (3.30) Key count:
> > 216764 Size (bytes): 75674839
> > 2022-11-22T18:04:09.757825+0000 osd.47 (osd.47) 10964 : cluster [WRN]
> > Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> > 449776 Size (bytes): 157310435
> > 2022-11-23T00:44:55.316254+0000 osd.29 (osd.29) 163 : cluster [WRN]
> > Large omap object found. Object: 3:61ad76c5:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.12:head PG: 3.a36eb586 (3.6) Key count:
> > 213843 Size (bytes): 74703544
> > 2022-11-23T09:10:07.842425+0000 osd.55 (osd.55) 13968 : cluster [WRN]
> > Large omap object found. Object: 3:3fa378c9:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.20:head PG: 3.931ec5fc (3.3c) Key count:
> > 219204 Size (bytes): 76509687
> > 2022-11-23T09:11:15.516973+0000 osd.72 (osd.72) 112 : cluster [WRN]
> > Large omap object found. Object: 3:2b3227e7:::.dir.ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19.22:head PG: 3.e7e44cd4 (3.14) Key count:
> > 326676 Size (bytes): 114453145
> > root@ceph-mon1:~# radosgw-admin metadata list "bucket.instance" | grep
> > ee3fa6a3-4af3-4ac2-86c2-d2c374080b54.63073818.19
> >     "68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs:ee3fa6a3-4af3-4ac2-
> > 86c2-d2c374080b54.63073818.19",
> > 
> > [2]
> > root@ceph-mon1:~# radosgw-admin bucket stats --bucket
> > 68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs
> > {
> >     "bucket": "snapshotnfs",
> >     "num_shards": 23,
> >     "tenant": "68ddc61c613a4e3096ca8c349ee37f56",
> >     "zonegroup": "bf22bf53-c135-450b-946f-97e16d1bc326",
> >     "placement_rule": "default-placement",
> >     "explicit_placement": {
> >         "data_pool": "",
> >         "data_extra_pool": "",
> >         "index_pool": ""
> >     },
> >     "id": "ee3fa6a3-4af3-4ac2-86c2-d2c374080b54.63073818.19",
> >     "marker": "ee3fa6a3-4af3-4ac2-86c2-d2c374080b54.63090893.15",
> >     "index_type": "Normal",
> >     "owner":
> > "68ddc61c613a4e3096ca8c349ee37f56$68ddc61c613a4e3096ca8c349ee37f56",
> >     "ver":
> > "0#205,1#32,2#78,3#41,4#25,5#23,6#30,7#94732,8#24,9#190897,10#93417,11
> > #128,12#91536,13#23,14#407,15#137262,16#24,17#32,18#104,19#63,20#94213
> > ,21#24,22#140543",
> >     "master_ver":
> > "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0
> > ,16#0,17#0,18#0,19#0,20#0,21#0,22#0",
> >     "mtime": "2022-11-14T07:55:28.287021Z",
> >     "creation_time": "2022-11-07T07:08:58.874542Z",
> >     "max_marker":
> > "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#
> > ,20#,21#,22#",
> >     "usage": {
> >         "rgw.main": {
> >             "size": 36246282736024,
> >             "size_actual": 36246329626624,
> >             "size_utilized": 36246282736024,
> >             "size_kb": 35396760485,
> >             "size_kb_actual": 35396806276,
> >             "size_kb_utilized": 35396760485,
> >             "num_objects": 1837484
> >         },
> >         "rgw.multimeta": {
> >             "size": 0,
> >             "size_actual": 0,
> >             "size_utilized": 24570,
> >             "size_kb": 0,
> >             "size_kb_actual": 0,
> >             "size_kb_utilized": 24,
> >             "num_objects": 910
> >         }
> >     },
> >     "bucket_quota": {
> >         "enabled": false,
> >         "check_on_raw": true,
> >         "max_size": -1,
> >         "max_size_kb": 0,
> >         "max_objects": -1
> >     }
> > }
> > 
> > [3]
> > https://docs.ceph.com/en/octopus/radosgw/dynamicresharding/
> > 
> > root@ceph-mon1:~# radosgw-admin reshard add --bucket
> > 68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs --num-shards 29
> > root@ceph-mon1:~# radosgw-admin reshard list
> > [
> >     {
> >         "time": "2022-11-23T10:38:25.690183Z",
> >         "tenant": "",
> >         "bucket_name": "68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs",
> 
> it doesn't look like the 'reshard add' command understands this
> "tenant/bucket" format you provided. you might try specifying the
> --tenant separately
> 

Thank you, that did the trick. After processing the reshard and deep
scrubbing the affected pgs health went back to okay.
Now I am left to wonder why there was no dynamic resharding though.

> >         "bucket_id": "ee3fa6a3-4af3-4ac2-86c2-
> > d2c374080b54.63073818.19",
> >         "new_instance_id": "",
> >         "old_num_shards": 23,
> >         "new_num_shards": 29
> >     }
> > ]
> > root@ceph-mon1:~# radosgw-admin reshard process
> > 2022-11-23T10:41:20.758+0000 7f58cf9d2080  0 process_single_logshard:
> > Error during resharding bucket
> > 68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs:(2) No such file or
> > directory
> > 
> > 
> > 
> > 
> > --
> > Jan Horstmann
> > Systementwickler | Infrastruktur
> > _____
> > 
> > 
> > Mittwald CM Service GmbH & Co. KG
> > Königsberger Straße 4-6
> > 32339 Espelkamp
> > 
> > Tel.: 05772 / 293-900
> > Fax: 05772 / 293-333
> > 
> > j.horstmann@xxxxxxxxxxx
> > https://www.mittwald.de
> > 
> > Geschäftsführer: Robert Meyer, Florian Jürgens
> > 
> > USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
> > Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad
> > Oeynhausen
> > 
> > Informationen zur Datenverarbeitung im Rahmen unserer
> > Geschäftstätigkeit
> > gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx