Re: Problem with Ceph daemons

Eugen Block <eblock@xxxxxx> · Thu, 17 Feb 2022 07:32:20 +0000

Can you retry after resetting the systemd unit? The message "Start  
request repeated too quickly." should be cleared first, then start it  
again:

systemctl reset-failed  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service
systemctl start  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service

Then check the logs again. If there's still nothing in the rgw log  
then you'll need to check the (active) mgr daemon logs for anything  
suspicious and also the syslog on that rgw host. Is the rest of the  
cluster healthy? Are rgw daemons colocated with other services?

Zitat von Ron Gage <ron@xxxxxxxxxxx>:

Adam:

Not really….

-- Unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
begun starting up.

Feb 16 15:01:03 c01 podman[426007]:

Feb 16 15:01:04 c01 bash[426007]:  
915d1e19fa0f213902c666371c8e825480e103f85172f3b15d1d5bf2427a87c9

Feb 16 15:01:04 c01 conmon[426038]: debug  
2022-02-16T20:01:04.303+0000 7f4f72ff6440  0 deferred set uid:gid to  
167:167 (ceph:ceph)

Feb 16 15:01:04 c01 conmon[426038]: debug  
2022-02-16T20:01:04.303+0000 7f4f72ff6440  0 ceph version 16.2.7  
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (st>

Feb 16 15:01:04 c01 conmon[426038]: debug  
2022-02-16T20:01:04.303+0000 7f4f72ff6440  0 framework: beast

Feb 16 15:01:04 c01 conmon[426038]: debug  
2022-02-16T20:01:04.303+0000 7f4f72ff6440  0 framework conf key:  
port, val: 80

Feb 16 15:01:04 c01 conmon[426038]: debug  
2022-02-16T20:01:04.303+0000 7f4f72ff6440  1 radosgw_Main not  
setting numa affinity

Feb 16 15:01:04 c01 systemd[1]: Started Ceph rgw.obj0.c01.gpqshk for  
35194656-893e-11ec-85c8-005056870dae.

-- Subject: Unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
finished start-up

-- Defined-By: systemd

-- Support: https://access.redhat.com/support

--

-- Unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
finished starting up.

--

-- The start-up result is done.

Feb 16 15:01:04 c01 systemd[1]:  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service: Main  
process exited, code=exited, status=98/n/a

Feb 16 15:01:05 c01 systemd[1]:  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:  
Failed with result 'exit-code'.

-- Subject: Unit failed

-- Defined-By: systemd

-- Support: https://access.redhat.com/support

--

-- The unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
entered the 'failed' state with result 'exit-code'.

Feb 16 15:01:15 c01 systemd[1]:  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:  
Service RestartSec=10s expired, scheduling restart.

Feb 16 15:01:15 c01 systemd[1]:  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:  
Scheduled restart job, restart counter is at 5.

-- Subject: Automatic restarting of a unit has been scheduled

-- Defined-By: systemd

-- Support: https://access.redhat.com/support

--

-- Automatic restarting of the unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
been scheduled, as the result for

-- the configured Restart= setting for the unit.

Feb 16 15:01:15 c01 systemd[1]: Stopped Ceph rgw.obj0.c01.gpqshk for  
35194656-893e-11ec-85c8-005056870dae.

-- Subject: Unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
finished shutting down

-- Defined-By: systemd

-- Support: https://access.redhat.com/support

--

-- Unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
finished shutting down.

Feb 16 15:01:15 c01 systemd[1]:  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service: Start  
request repeated too quickly.

Feb 16 15:01:15 c01 systemd[1]:  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service:  
Failed with result 'exit-code'.

-- Subject: Unit failed

-- Defined-By: systemd

-- Support: https://access.redhat.com/support

--

-- The unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
entered the 'failed' state with result 'exit-code'.

Feb 16 15:01:15 c01 systemd[1]: Failed to start Ceph  
rgw.obj0.c01.gpqshk for 35194656-893e-11ec-85c8-005056870dae.

-- Subject: Unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
failed

-- Defined-By: systemd

-- Support: https://access.redhat.com/support

--

-- Unit  
ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk.service has  
failed.

--

-- The result is failed.

Ron Gage

Westland, MI

From: Adam King <adking@xxxxxxxxxx>
Sent: Wednesday, February 16, 2022 4:18 PM
To: Ron Gage <ron@xxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxx>
Subject: Re:  Problem with Ceph daemons

Is there anything useful in the rgw daemon's logs? (e.g. journalctl  
-xeu ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk  
<mailto:ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk>  
)

 - Adam King

On Wed, Feb 16, 2022 at 3:58 PM Ron Gage <ron@xxxxxxxxxxx  
<mailto:ron@xxxxxxxxxxx> > wrote:

Hi everyone!

Looks like I am having some problems with some of my ceph RGW daemons - they
won't stay running.

From 'cephadm ls'.

{

        "style": "cephadm:v1",

        "name": "rgw.obj0.c01.gpqshk",

        "fsid": "35194656-893e-11ec-85c8-005056870dae",

        "systemd_unit":
"ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk
<mailto:ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk  
<mailto:ceph-35194656-893e-11ec-85c8-005056870dae@rgw.obj0.c01.gpqshk> >  
",

        "enabled": true,

        "state": "error",

        "service_name": "rgw.obj0",

        "ports": [

            80

        ],

        "ip": null,

        "deployed_by": [

"quay.io/ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318538c7e2cd9114  
<http://quay.io/ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318538c7e2cd911458800097f7d97d>
58800097f7d97d
<mailto:quay.io <mailto:quay.io>  
/ceph/ceph@sha256:c3a89afac4f9c83c716af57e08863f7010318538c7e
2cd911458800097f7d97d> ",

"quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76fff41a7  
<http://quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76fff41a77fa32d0b903061>
7fa32d0b903061
<mailto:quay.io <mailto:quay.io>  
/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76
fff41a77fa32d0b903061> "

        ],

        "rank": null,

        "rank_generation": null,

        "memory_request": null,

        "memory_limit": null,

        "container_id": null,

        "container_image_name":
"quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76fff41a7  
<http://quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76fff41a77fa32d0b903061>
7fa32d0b903061
<mailto:quay.io <mailto:quay.io>  
/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76
fff41a77fa32d0b903061> ",

        "container_image_id": null,

        "container_image_digests": null,

        "version": null,

        "started": null,

        "created": "2022-02-09T01:00:53.411541Z",

        "deployed": "2022-02-09T01:00:52.338515Z",

        "configured": "2022-02-09T01:00:53.411541Z"

    },

That whole "state: error" bit is concerning to me - and it contributing to
the cluster status of warning (showing 6 cephadm daemons down).

Can I get a hint or two on how to fix this?

Thanks!

Ron Gage

Westland, MI

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx  
<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx