Hi,
a couple of threads with similar error messages all lead back to some
sort of pool or osd issue. What is your current cluster status (ceph
-s)? Do you have some full OSDs? Those can cause this initialization
timeout as well as hit the max_pg_per_osd limit. So a few more cluster
details could help here.
Thanks,
Eugen
Zitat von "Ben.Zieglmeier" <Ben.Zieglmeier@xxxxxxxxxx>:
Hello,
We have an RGW cluster that was recently upgraded from 12.2.11 to
14.2.22. The upgrade went mostly fine, though now several of our
RGWs will not start. One RGW is working fine, the rest will not
initialize. They are on a crash loop. This is part of a multisite
configuration, and is currently not the master zone. Current master
zone is running 14.2.22. These are the only two zones in the
zonegroup. After turning debug up to 20, these are the log snippets
between each crash:
```
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got
periods.1b6e1a93-98ba-4378-bc5c-d36cd5542f11.52
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got
periods.1b6e1a93-98ba-4378-bc5c-d36cd5542f11.54
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got
realms_names. <redacted>
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got
<redacted>
2023-07-20 14:29:56.371 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.371 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.371 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=114
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.374 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=686
2023-07-20 14:29:56.374 7fd8dec40900 20 period zonegroup init ret 0
2023-07-20 14:29:56.374 7fd8dec40900 20 period zonegroup name <redacted>
2023-07-20 14:29:56.374 7fd8dec40900 20 using current period
zonegroup <redacted>
2023-07-20 14:29:56.374 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.374 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.374 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.375 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=903
2023-07-20 14:29:56.375 7fd8dec40900 10 Cannot find current period
zone using local zone
2023-07-20 14:29:56.375 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.375 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=903
2023-07-20 14:29:56.375 7fd8dec40900 20 zone <redacted>
2023-07-20 14:29:56.375 7fd8dec40900 20 generating connection object
for zone <redacted> id f10b465f-bf18-47d0-a51c-ca4f17118ee1
2023-07-20 14:34:56.198 7fd8cafe8700 -1 Initialization timeout,
failed to initialize
```
I’ve checked all file permissions, filesystem free space, disabled
selinux and firewalld, tried turning up the initialization timeout
to 600, and tried removing all non-essential config from ceph.conf.
All produce the same results. I would greatly appreciate any other
ideas or insight.
Thanks,
Ben
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx