All: I think I found the problem - hence... [root@c01 ceph]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 2m ago 9d count:1 crash 6/6 2m ago 9d * grafana ?:3000 1/1 2m ago 9d count:1 mgr 2/2 2m ago 9d count:2 mon 5/5 2m ago 9d count:5 node-exporter ?:9100 6/6 2m ago 9d * osd 2 2m ago - <unmanaged> osd.all-available-devices 16 2m ago 2d * prometheus ?:9095 1/1 2m ago 9d count:1 rgw.obj0 ?:80 1/6 2m ago 9d c01;c02;c03;c04;c05;c06;count:6 rgw.obj01 ?:80 5/6 2m ago 5d c01;c02;c03;c04;c05;c06 To my untrained eye, it looks like rgw.obj0 is extra and unneeded. Does anyone know a way to prove this out and if needed remove it? Thanks! Ron Gage Westland, MI -----Original Message----- From: Eugen Block <eblock@xxxxxx> Sent: Thursday, February 17, 2022 2:32 AM To: ceph-users@xxxxxxx Subject: Re: Problem with Ceph daemons Can you retry after resetting the systemd unit? The message "Start request repeated too quickly." should be cleared first, then start it again: systemctl reset-failed ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service systemctl start ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service Then check the logs again. If there's still nothing in the rgw log then you'll need to check the (active) mgr daemon logs for anything suspicious and also the syslog on that rgw host. Is the rest of the cluster healthy? Are rgw daemons colocated with other services? Zitat von Ron Gage <ron@xxxxxxxxxxx>: > Adam: > > > > Not really…. > > > > -- Unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has begun starting up. > > Feb 16 15:01:03 c01 podman[426007]: > > Feb 16 15:01:04 c01 bash[426007]: > 915d1e19fa0f213902c666371c8e825480e103f85172f3b15d1d5bf2427a87c9 > > Feb 16 15:01:04 c01 conmon[426038]: debug > 2022-02-16T20:01:04.303+0000 7f4f72ff6440 0 deferred set uid:gid to > 167:167 (ceph:ceph) > > Feb 16 15:01:04 c01 conmon[426038]: debug > 2022-02-16T20:01:04.303+0000 7f4f72ff6440 0 ceph version 16.2.7 > (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (st> > > Feb 16 15:01:04 c01 conmon[426038]: debug > 2022-02-16T20:01:04.303+0000 7f4f72ff6440 0 framework: beast > > Feb 16 15:01:04 c01 conmon[426038]: debug > 2022-02-16T20:01:04.303+0000 7f4f72ff6440 0 framework conf key: > port, val: 80 > > Feb 16 15:01:04 c01 conmon[426038]: debug > 2022-02-16T20:01:04.303+0000 7f4f72ff6440 1 radosgw_Main not setting > numa affinity > > Feb 16 15:01:04 c01 systemd[1]: Started Ceph rgw.obj0.c01.gpqshk for > 35194656-893e-11ec-85c8-005056870dae. > > -- Subject: Unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has finished start-up > > -- Defined-By: systemd > > -- Support: https://access.redhat.com/support > > -- > > -- Unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has finished starting up. > > -- > > -- The start-up result is done. > > Feb 16 15:01:04 c01 systemd[1]: > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service: > Main process exited, code=exited, status=98/n/a > > Feb 16 15:01:05 c01 systemd[1]: > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service: > Failed with result 'exit-code'. > > -- Subject: Unit failed > > -- Defined-By: systemd > > -- Support: https://access.redhat.com/support > > -- > > -- The unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has entered the 'failed' state with result 'exit-code'. > > Feb 16 15:01:15 c01 systemd[1]: > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service: > Service RestartSec=10s expired, scheduling restart. > > Feb 16 15:01:15 c01 systemd[1]: > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service: > Scheduled restart job, restart counter is at 5. > > -- Subject: Automatic restarting of a unit has been scheduled > > -- Defined-By: systemd > > -- Support: https://access.redhat.com/support > > -- > > -- Automatic restarting of the unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has been scheduled, as the result for > > -- the configured Restart= setting for the unit. > > Feb 16 15:01:15 c01 systemd[1]: Stopped Ceph rgw.obj0.c01.gpqshk for > 35194656-893e-11ec-85c8-005056870dae. > > -- Subject: Unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has finished shutting down > > -- Defined-By: systemd > > -- Support: https://access.redhat.com/support > > -- > > -- Unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has finished shutting down. > > Feb 16 15:01:15 c01 systemd[1]: > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service: > Start request repeated too quickly. > > Feb 16 15:01:15 c01 systemd[1]: > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service: > Failed with result 'exit-code'. > > -- Subject: Unit failed > > -- Defined-By: systemd > > -- Support: https://access.redhat.com/support > > -- > > -- The unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has entered the 'failed' state with result 'exit-code'. > > Feb 16 15:01:15 c01 systemd[1]: Failed to start Ceph > rgw.obj0.c01.gpqshk for 35194656-893e-11ec-85c8-005056870dae. > > -- Subject: Unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has failed > > -- Defined-By: systemd > > -- Support: https://access.redhat.com/support > > -- > > -- Unit > ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service > has failed. > > -- > > -- The result is failed. > > > > Ron Gage > > Westland, MI > > > > From: Adam King <adking@xxxxxxxxxx> > Sent: Wednesday, February 16, 2022 4:18 PM > To: Ron Gage <ron@xxxxxxxxxxx> > Cc: ceph-users <ceph-users@xxxxxxx> > Subject: Re: Problem with Ceph daemons > > > > Is there anything useful in the rgw daemon's logs? (e.g. journalctl > -xeu ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk > <mailto:ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk> > ) > > > > - Adam King > > > > On Wed, Feb 16, 2022 at 3:58 PM Ron Gage <ron@xxxxxxxxxxx > <mailto:ron@xxxxxxxxxxx> > wrote: > > Hi everyone! > > > > Looks like I am having some problems with some of my ceph RGW daemons > - they won't stay running. > > > > From 'cephadm ls'. > > > > { > > "style": "cephadm:v1", > > "name": "rgw.obj0.c01.gpqshk", > > "fsid": "35194656-893e-11ec-85c8-005056870dae", > > "systemd_unit": > "ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk > <mailto:ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk > <mailto:ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk> > > ", > > "enabled": true, > > "state": "error", > > "service_name": "rgw.obj0", > > "ports": [ > > 80 > > ], > > "ip": null, > > "deployed_by": [ > > > "quay.io/ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318538c7e2 > cd9114 > <http://quay.io/ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318 > 538c7e2cd911458800097f7d97d> > 58800097f7d97d > <mailto:quay.io <mailto:quay.io> > /ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318538c7e > 2cd911458800097f7d97d> ", > > > "quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76f > ff41a7 > <http://quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f3 > 1eaa76fff41a77fa32d0b903061> > 7fa32d0b903061 > <mailto:quay.io <mailto:quay.io> > /ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76 > fff41a77fa32d0b903061> " > > ], > > "rank": null, > > "rank_generation": null, > > "memory_request": null, > > "memory_limit": null, > > "container_id": null, > > "container_image_name": > "quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76f > ff41a7 > <http://quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f3 > 1eaa76fff41a77fa32d0b903061> > 7fa32d0b903061 > <mailto:quay.io <mailto:quay.io> > /ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76 > fff41a77fa32d0b903061> ", > > "container_image_id": null, > > "container_image_digests": null, > > "version": null, > > "started": null, > > "created": "2022-02-09T01:00:53.411541Z", > > "deployed": "2022-02-09T01:00:52.338515Z", > > "configured": "2022-02-09T01:00:53.411541Z" > > }, > > > > That whole "state: error" bit is concerning to me - and it > contributing to the cluster status of warning (showing 6 cephadm daemons down). > > > > Can I get a hint or two on how to fix this? > > > Thanks! > > > > Ron Gage > > Westland, MI > > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > <mailto:ceph-users@xxxxxxx> To unsubscribe send an email to > ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx