The full OSD is most likely the reason. You can temporarily increase
the threshold to 0.97 or so, but you need to prevent that to happen.
The cluster usually starts warning you at 85%.
Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:
Hi,
for some reason radosgw stopped working.
Cluster status:
[root@ctplmon1 ~]# ceph -v
ceph version 17.2.8 (f817ceb7f187defb1d021d6328fa833eb8e943b3) quincy
(stable)
[root@ctplmon1 ~]# ceph -s
cluster:
id: 0a6e5422-ac75-4093-af20-528ee00cc847
health: HEALTH_ERR
6 OSD(s) experiencing slow operations in BlueStore
2 backfillfull osd(s)
1 full osd(s)
1 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't
resolve itself): 32 pgs backfill_toofull
Degraded data redundancy: 835306/1285383707 objects degraded
(0.065%), 6 pgs degraded, 5 pgs undersized
76 pgs not deep-scrubbed in time
45 pgs not scrubbed in time
Full OSDs blocking recovery: 1 pg recovery_toofull
9 pool(s) full
9 daemons have recently crashed
services:
mon: 3 daemons, quorum ctplmon1,ctplmon3,ctplmon2 (age 36m)
mgr: ctplmon1(active, since 65m)
mds: 1/1 daemons up
osd: 193 osds: 191 up (since 8m), 191 in (since 9m); 267 remapped pgs
rgw: 2 daemons active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 10 pools, 793 pgs
objects: 257.08M objects, 292 TiB
usage: 614 TiB used, 386 TiB / 1000 TiB avail
pgs: 835306/1285383707 objects degraded (0.065%)
225512620/1285383707 objects misplaced (17.544%)
525 active+clean
230 active+remapped+backfilling
32 active+remapped+backfill_toofull
5 active+undersized+degraded+remapped+backfilling
1 active+recovery_toofull+degraded
io:
recovery: 978 MiB/s, 825 objects/s
---
Do not know if it is related but the cluster has been rebalancing for a few
days now, after I've set EC pool only to use hdd.
---
If I start rgw with debug I get something like this in logs:
[root@ctplmon2 ~]# radosgw -c /etc/ceph/ceph.conf --setuser ceph --setgroup
ceph -n client.radosgw.moja.shramba.ctplmon2 -f -m 194.249.4.104:6789
--debug-rgw=99/99
2024-12-21T23:21:59.898+0100 7f659e380640 -1 Initialization timeout, failed
to initialize
In logs I get:
2024-12-21T23:16:59.898+0100 7f65a19257c0 0 deferred set uid:gid to
167:167 (ceph:ceph)
2024-12-21T23:16:59.898+0100 7f65a19257c0 0 ceph version 17.2.8
(f817ceb7f187defb1d021d6328fa833eb8e943b3) quincy (stable), process
radosgw, pid 168935
2024-12-21T23:16:59.898+0100 7f65a19257c0 0 framework: beast
2024-12-21T23:16:59.898+0100 7f65a19257c0 0 framework conf key: port, val:
4444
2024-12-21T23:16:59.898+0100 7f65a19257c0 1 radosgw_Main not setting numa
affinity
2024-12-21T23:16:59.901+0100 7f65a19257c0 1 rgw_d3n:
rgw_d3n_l1_local_datacache_enabled=0
2024-12-21T23:16:59.901+0100 7f65a19257c0 1 D3N datacache enabled: 0
2024-12-21T23:16:59.901+0100 7f658dffb640 20 reqs_thread_entry: start
2024-12-21T23:16:59.901+0100 7f658d7fa640 10 entry start
2024-12-21T23:16:59.908+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.914+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:16:59.914+0100 7f65a19257c0 20 rgw main: realm
2024-12-21T23:16:59.914+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.915+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:16:59.915+0100 7f65a19257c0 4 rgw main: RGWPeriod::init
failed to init realm id : (2) No such file or directory
2024-12-21T23:16:59.915+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.915+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:16:59.915+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.917+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:16:59.917+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.945+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=873
2024-12-21T23:16:59.945+0100 7f65a19257c0 20 rgw main: searching for the
correct realm
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zone_info.c2c70444-7a41-4acd-a0d0-9f87d324ec72
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got
zonegroup_info.b1e0d55c-f7cb-4e73-b1cb-6cffa1fd6578
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zone_names.default
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zonegroups_names.default
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.211+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:17:00.211+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.212+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=358
2024-12-21T23:17:00.212+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.213+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.213+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.214+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.214+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.215+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zone_info.c2c70444-7a41-4acd-a0d0-9f87d324ec72
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got
zonegroup_info.b1e0d55c-f7cb-4e73-b1cb-6cffa1fd6578
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zone_names.default
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zonegroups_names.default
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.285+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:17:00.285+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.286+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=873
2024-12-21T23:17:00.286+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.287+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:17:00.287+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.293+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=358
2024-12-21T23:17:00.293+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.295+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.295+0100 7f65a19257c0 20 zone default found
2024-12-21T23:17:00.295+0100 7f65a19257c0 4 rgw main: Realm:
()
2024-12-21T23:17:00.295+0100 7f65a19257c0 4 rgw main: ZoneGroup: default
(b1e0d55c-f7cb-4e73-b1cb-6cffa1fd6578)
2024-12-21T23:17:00.295+0100 7f65a19257c0 4 rgw main: Zone: default
(c2c70444-7a41-4acd-a0d0-9f87d324ec72)
2024-12-21T23:17:00.295+0100 7f65a19257c0 10 cannot find current period
zonegroup using local zonegroup configuration
2024-12-21T23:17:00.295+0100 7f65a19257c0 20 rgw main: zonegroup default
2024-12-21T23:17:00.295+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.296+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.296+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.299+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.299+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.303+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.303+0100 7f65a19257c0 20 rgw main: started sync module
instance, tier type =
2024-12-21T23:17:00.303+0100 7f65a19257c0 20 rgw main: started zone
id=c2c70444-7a41-4acd-a0d0-9f87d324ec72 (name=default) with tier type =
2024-12-21T23:21:59.898+0100 7f659e380640 -1 Initialization timeout, failed
to initialize
---
Any ideas what might cause rgw to stop working?
Kind regards,
Rok
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx