Robin, Thank you very much for your quick response. It was really helpful. The issue has been successfully resolved. The RC: podman (by default) allows 2048 threads, while (for some reason) Ceph had 8K thread pool size for RGWs. Changing this parameter helps bring the cluster to a normal state. Again, I appreciate your help. Sincerely, Vladimir On Sat, Feb 10, 2024 at 1:24 PM Robin H. Johnson <robbat2@xxxxxxxxxx> wrote: > On Sat, Feb 10, 2024 at 10:05:02AM -0500, Vladimir Sigunov wrote: > > Hello Community! > > I would appreciate any help/suggestions with the massive RGWs outage we > are > > facing. > > The cluster's overall status is acceptable (HEALTH_WARN because of some > pgs > > not scrubbed in time), and the cluster is operational. > > However, all RGWs fail to start with a core dump. > > The only issue I see at the moment is the RGW GC queue (radosgs-admin gc > > list) that contains 600K records. > > I believe this could be the root cause of the issue. When I pause OSD > iops > > (ceph osd pause), all RGWs starting with no issues. > > There are no large OMAPs or any other warnings in ceph -s output. > > To get you going for the moment, how about disabling the GC threads in > the RGW daemon, and then processing GC async. > > Add "rgw_enable_gc_threads=0" to ceph.conf. > > After that, testing to see why you get the dump; start up a seperate RGW > instance with debug logging enabled. > > -- > Robin Hugh Johnson > Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer > E-Mail : robbat2@xxxxxxxxxx > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx