Re: RGW core dump at start

Vladimir Sigunov <vladimir.sigunov@xxxxxxxxx> · Sat, 10 Feb 2024 13:39:37 -0500

Robin,
Thank you very much for your quick response. It was really helpful.
The issue has been successfully resolved.
The RC:
podman (by default) allows 2048 threads, while (for some reason) Ceph had
8K thread pool size for RGWs. Changing this parameter helps bring the
cluster to a normal state.

Again, I appreciate your help.

Sincerely,
Vladimir

On Sat, Feb 10, 2024 at 1:24 PM Robin H. Johnson <robbat2@xxxxxxxxxx> wrote:

> On Sat, Feb 10, 2024 at 10:05:02AM -0500, Vladimir Sigunov wrote:
> > Hello Community!
> > I would appreciate any help/suggestions with the massive RGWs outage we
> are
> > facing.
> > The cluster's overall status is acceptable (HEALTH_WARN because of some
> pgs
> > not scrubbed in time), and the cluster is operational.
> > However, all RGWs fail to start with a core dump.
> > The only issue I see at the moment is the RGW GC queue (radosgs-admin gc
> > list) that contains 600K records.
> > I believe this could be the root cause of the issue. When I pause OSD
> iops
> > (ceph osd pause), all RGWs starting with no issues.
> > There are no large OMAPs or any other warnings in ceph -s output.
>
> To get you going for the moment, how about disabling the GC threads in
> the RGW daemon, and then processing GC async.
>
> Add "rgw_enable_gc_threads=0" to ceph.conf.
>
> After that, testing to see why you get the dump; start up a seperate RGW
> instance with debug logging enabled.
>
> --
> Robin Hugh Johnson
> Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
> E-Mail   : robbat2@xxxxxxxxxx
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx