Problem starting radosgw-admin and rados hangs when .rgw.root is incomplete

Carl J Taylor <cjtaylor@xxxxxxxxx> · Mon, 5 Feb 2024 01:37:16 +0000

Hi,
Can anyone shed light on this please?
I have had our cluster crashed and now managed to get everything back up
and running, osds have nearly rebalanced but I am seeing issues with rgw.

2024-02-05T01:29:56.272+0000 7f7237e75f40 20 rados->read ofs=0 len=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados_obj.operate() r=-2
bl.length=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 realm
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados->read ofs=0 len=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados_obj.operate() r=-2
bl.length=0
2024-02-05T01:29:56.276+0000 7f7237e75f40  4 RGWPeriod::init failed to init
realm  id  : (2) No such file or directory
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados->read ofs=0 len=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados_obj.operate() r=-2
bl.length=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados->read ofs=0 len=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados_obj.operate() r=0
bl.length=17
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados->read ofs=0 len=0

The .rgw.root and .rgw.index are both marked incomplete and one pg for each
was restored from a bad disk.  Both pools are now showing a status
of peering_blocked_by_history_les_bound.

I do have some other pgs with important data that can be recovered from the
disks but it is not essential that is done straight away.  I need to get
RGW running so I can delete old data and free up some space to allow
backfiling to complete.
Version is 18.2.1 running under cephadm
 data:
    pools:   19 pools, 801 pgs
    objects: 9.23M objects, 4.7 TiB
    usage:   9.8 TiB used, 2.7 TiB / 12 TiB avail
    pgs:     2.122% pgs not active
             435947/18456424 objects degraded (2.362%)
             559225/18456424 objects misplaced (3.030%)
             758 active+clean
             17  incomplete
             12  active+undersized+degraded+remapped+backfill_toofull
             12  active+remapped+backfill_toofull
             2   active+clean+scrubbing+deep

If anyone can suggest a known way of recovering from this your advice would
be appreciated.

Kind regards
Carl.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx