Re: "ceph orch restart mgr" creates manager daemon restart loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

Just for the record, I've reproduced this on Octopus (15.2.14).  After
running `ceph orch restart mgr`,  I see:

Scheduled to restart mgr.node1.puuiwd on host 'node1'
Scheduled to restart mgr.node3.xambei on host 'node3'
Scheduled to restart mgr.node2.viomxk on host 'node2'

Then the mgr instances go into an endless restart loop.  If I run `ceph
config-key dump`, I see:

  ...
  "mgr/cephadm/host.node1": "{\"...
    \"scheduled_daemon_actions\": {\"mgr.node1.puuiwd\": \"restart\"}}"
  ...
  (and so on for other mgr instances on other nodes)

I assume the problem is that those scheduled_daemon_actions never go
away.  My workaround, somewhat based on Adam's comment below, was to
`ceph orch daemon rm` each mgr instance one after another, and let the
orchestrator automatically redeploy new instances with different random
IDs.  I hope this helps anyone else who hits this problem.

(I haven't tried Pacific or newer to see if it's still a problem there)

Regards,

Tim

On 11/26/21 4:25 AM, Roman Steinhart wrote:
> Hi Adam,
> 
> thanks for the suggestion. I did what you proposed and it worked.
> 
> On Tue, 23 Nov 2021 at 14:53, Adam King <adking@xxxxxxxxxx> wrote:
> 
>> One thing you could maybe try is, if you have a host available that
>> doesn't have a mgr, moving the mgr service to that host. I.e. if you have
>> the mgr thrashing happening and have a mgr on host1 and host2 but host3 has
>> no mgr you could run" ceph orch apply mgr host3". Then once it settles down
>> move the mgr service back to its original spot.
>>
>> On Tue, Nov 23, 2021 at 8:19 AM Roman Steinhart <roman@xxxxxxxxxxx> wrote:
>>
>>> Hi Adam,
>>>
>>> I'm on 15.2.15
>>> Yep, I remember seeing such a "scheduled" message for each of my managers
>>>
>>>
>>>
>>> On Tue, 23 Nov 2021 at 14:14, Adam King <adking@xxxxxxxxxx> wrote:
>>>
>>>> Hi Roman, what ceph version are you on? Also, when you ran the
>>>> restart command originally, did you get a message about scheduling the
>>>> restarts or no output?
>>>>
>>>>
>>>>
>>>> On Tue, Nov 23, 2021 at 6:04 AM Roman Steinhart <roman@xxxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> while digging down another issue I had with the managers I restarted
>>>>> them
>>>>> using "ceph orch restart mgr".
>>>>> After running that command the main manager is now in a restart loop.
>>>>> The only way for me to stop this is running "ceph orch pause", as soon
>>>>> as I
>>>>> do "ceph orch resume" the loop starts again.
>>>>>
>>>>> Has anyone a suggestion on how I can remove that stuck "restart" job?
>>>>> "ceph orch cancel" does not work, it returns "This Orchestrator does not
>>>>> support `orch cancel`"
>>>>>
>>>>> While googling for this issue it seems I'm not the first one having that
>>>>> issue:
>>>>>
>>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/IRC4ZPSSZYELWPU5D2FHKWJ2VU7IP3JG/
>>>>>
>>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/WZUWZGVYPFII5HTUTW6OAQXIZ6VNT2E2/
>>>>>
>>>>> See the logs below:
>>>>> user@ceph1:~# journalctl -fu
>>>>> ceph-6d588189-f434-4cb1-8c60-6e48cbf43a2a@mgr.ceph1.service --since "1
>>>>> day
>>>>> ago" -g "Ceph mgr.ceph1 for 6d588189-f434-4cb1-8c60-6e48cbf43a2a"
>>>>> Nov 22 15:31:38 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 15:31:51 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:31:51 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:34:49 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 15:35:00 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:35:00 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:38:39 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 15:38:50 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:38:50 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:43:35 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 15:43:46 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:43:46 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:50:19 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 15:50:30 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:50:30 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:58:26 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 15:58:37 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 15:58:37 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:10:16 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 16:10:28 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:10:28 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:17:17 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 16:17:29 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:17:29 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:40:52 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 16:41:07 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:41:07 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:45:34 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 16:45:47 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:45:47 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:49:34 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 16:49:39 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:49:39 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:51:47 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 16:51:51 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:51:51 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:54:02 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 16:54:08 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:54:08 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:57:37 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 16:57:41 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:57:41 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 16:59:58 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 17:00:02 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:00:02 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:02:42 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 17:02:45 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:02:45 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:04:40 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 17:04:43 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:04:43 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:07:28 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 17:07:35 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:07:35 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:10:50 ceph1 systemd[1]: Stopping Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a...
>>>>> Nov 22 17:10:54 ceph1 systemd[1]: Stopped Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>> Nov 22 17:10:54 ceph1 systemd[1]: Started Ceph mgr.ceph1 for
>>>>> 6d588189-f434-4cb1-8c60-6e48cbf43a2a.
>>>>>
>>>>> Thanks in advance
>>>>> ~ Roman
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>
>>>>>
>>>
>>> --
>>>
>>>
>>>
> 


-- 
Tim Serong
Senior Clustering Engineer
SUSE
tserong@xxxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux