Re: radosgw stopped working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The full OSD is most likely the reason. You can temporarily increase the threshold to 0.97 or so, but you need to prevent that to happen. The cluster usually starts warning you at 85%.

Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:

Hi,

for some reason radosgw stopped working.

Cluster status:
[root@ctplmon1 ~]# ceph -v
ceph version 17.2.8 (f817ceb7f187defb1d021d6328fa833eb8e943b3) quincy
(stable)
[root@ctplmon1 ~]# ceph -s
  cluster:
    id:     0a6e5422-ac75-4093-af20-528ee00cc847
    health: HEALTH_ERR
            6 OSD(s) experiencing slow operations in BlueStore
            2 backfillfull osd(s)
            1 full osd(s)
            1 nearfull osd(s)
            Low space hindering backfill (add storage if this doesn't
resolve itself): 32 pgs backfill_toofull
            Degraded data redundancy: 835306/1285383707 objects degraded
(0.065%), 6 pgs degraded, 5 pgs undersized
            76 pgs not deep-scrubbed in time
            45 pgs not scrubbed in time
            Full OSDs blocking recovery: 1 pg recovery_toofull
            9 pool(s) full
            9 daemons have recently crashed

  services:
    mon: 3 daemons, quorum ctplmon1,ctplmon3,ctplmon2 (age 36m)
    mgr: ctplmon1(active, since 65m)
    mds: 1/1 daemons up
    osd: 193 osds: 191 up (since 8m), 191 in (since 9m); 267 remapped pgs
    rgw: 2 daemons active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   10 pools, 793 pgs
    objects: 257.08M objects, 292 TiB
    usage:   614 TiB used, 386 TiB / 1000 TiB avail
    pgs:     835306/1285383707 objects degraded (0.065%)
             225512620/1285383707 objects misplaced (17.544%)
             525 active+clean
             230 active+remapped+backfilling
             32  active+remapped+backfill_toofull
             5   active+undersized+degraded+remapped+backfilling
             1   active+recovery_toofull+degraded

  io:
    recovery: 978 MiB/s, 825 objects/s

---

Do not know if it is related but the cluster has been rebalancing for a few
days now, after I've set EC pool only to use hdd.

---

If I start rgw with debug I get something like this in logs:
[root@ctplmon2 ~]# radosgw -c /etc/ceph/ceph.conf --setuser ceph --setgroup
ceph -n client.radosgw.moja.shramba.ctplmon2 -f -m 194.249.4.104:6789
--debug-rgw=99/99
2024-12-21T23:21:59.898+0100 7f659e380640 -1 Initialization timeout, failed
to initialize

In logs I get:
2024-12-21T23:16:59.898+0100 7f65a19257c0  0 deferred set uid:gid to
167:167 (ceph:ceph)
2024-12-21T23:16:59.898+0100 7f65a19257c0  0 ceph version 17.2.8
(f817ceb7f187defb1d021d6328fa833eb8e943b3) quincy (stable), process
radosgw, pid 168935
2024-12-21T23:16:59.898+0100 7f65a19257c0  0 framework: beast
2024-12-21T23:16:59.898+0100 7f65a19257c0  0 framework conf key: port, val:
4444
2024-12-21T23:16:59.898+0100 7f65a19257c0  1 radosgw_Main not setting numa
affinity
2024-12-21T23:16:59.901+0100 7f65a19257c0  1 rgw_d3n:
rgw_d3n_l1_local_datacache_enabled=0
2024-12-21T23:16:59.901+0100 7f65a19257c0  1 D3N datacache enabled: 0
2024-12-21T23:16:59.901+0100 7f658dffb640 20 reqs_thread_entry: start
2024-12-21T23:16:59.901+0100 7f658d7fa640 10 entry start
2024-12-21T23:16:59.908+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.914+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:16:59.914+0100 7f65a19257c0 20 rgw main: realm
2024-12-21T23:16:59.914+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.915+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:16:59.915+0100 7f65a19257c0  4 rgw main: RGWPeriod::init
failed to init realm  id  : (2) No such file or directory
2024-12-21T23:16:59.915+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.915+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:16:59.915+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.917+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:16:59.917+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:16:59.945+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=873
2024-12-21T23:16:59.945+0100 7f65a19257c0 20 rgw main: searching for the
correct realm
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zone_info.c2c70444-7a41-4acd-a0d0-9f87d324ec72
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got
zonegroup_info.b1e0d55c-f7cb-4e73-b1cb-6cffa1fd6578
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zone_names.default
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zonegroups_names.default
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.210+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.211+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:17:00.211+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.212+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=358
2024-12-21T23:17:00.212+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.213+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.213+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.214+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.214+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.215+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zone_info.c2c70444-7a41-4acd-a0d0-9f87d324ec72
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got
zonegroup_info.b1e0d55c-f7cb-4e73-b1cb-6cffa1fd6578
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zone_names.default
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main:
RGWRados::pool_iterate: got zonegroups_names.default
2024-12-21T23:17:00.284+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.285+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:17:00.285+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.286+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=873
2024-12-21T23:17:00.286+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.287+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=46
2024-12-21T23:17:00.287+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.293+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=0 bl.length=358
2024-12-21T23:17:00.293+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.295+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.295+0100 7f65a19257c0 20 zone default found
2024-12-21T23:17:00.295+0100 7f65a19257c0  4 rgw main: Realm:
           ()
2024-12-21T23:17:00.295+0100 7f65a19257c0  4 rgw main: ZoneGroup: default
           (b1e0d55c-f7cb-4e73-b1cb-6cffa1fd6578)
2024-12-21T23:17:00.295+0100 7f65a19257c0  4 rgw main: Zone:      default
           (c2c70444-7a41-4acd-a0d0-9f87d324ec72)
2024-12-21T23:17:00.295+0100 7f65a19257c0 10 cannot find current period
zonegroup using local zonegroup configuration
2024-12-21T23:17:00.295+0100 7f65a19257c0 20 rgw main: zonegroup default
2024-12-21T23:17:00.295+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.296+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.296+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.299+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.299+0100 7f65a19257c0 20 rgw main: rados->read ofs=0
len=0
2024-12-21T23:17:00.303+0100 7f65a19257c0 20 rgw main: rados_obj.operate()
r=-2 bl.length=0
2024-12-21T23:17:00.303+0100 7f65a19257c0 20 rgw main: started sync module
instance, tier type =
2024-12-21T23:17:00.303+0100 7f65a19257c0 20 rgw main: started zone
id=c2c70444-7a41-4acd-a0d0-9f87d324ec72 (name=default) with tier type =
2024-12-21T23:21:59.898+0100 7f659e380640 -1 Initialization timeout, failed
to initialize

---

Any ideas what might cause rgw to stop working?

Kind regards,
Rok
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux