Re: Ceph orch commands non-responsive after mgr/mon reboots 16.2.9

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I just wanted to follow up on this issue as it corrected itself today.  I started a drain/remove on two hosts a few weeks back, after the rolling restart of mgr/mon on the cluster it seems that the ops queue either became locked or overwhelmed with requests.  I had a degraded PG during the rolling reboot of the mon/mgr and that seems to have blocked ceph orch, balancer, autoscale-status cli commands from returning.  I could see in the manager debug logs that balancer was indeed running and returning results internally from the internal scheduled process but the cli would hang indefinitely.   This morning the last degraded/offline PG got resolved and all commands are running again.

Moving forward is there a method to view the ops queue or monitor if the queue gets full and starts to deprioritize CLI commands?

Tim


On 7/22/22, 6:32 PM, "Tim Olow" <tim@xxxxxxxx> wrote:

    Howdy,
    
    I seem to be facing a problem on my 16.2.9 ceph cluster.  After a staggered reboot of my 3 infra nodes all of ceph orch commands are hanging much like in this previous reported issue [1]
    
    I have paused orch and rebuilt a manager by hand as outlined here [2], and the issue continues to persist.   I am unable to scale up or down of services, restart daemons, etc.
    
    ceph orch ls –verbose
    <snip>
    [{'flags': 8,
      'help': 'List services known to orchestrator',
      'module': 'mgr',
      'perm': 'r',
      'sig': [argdesc(<class 'ceph_argparse.CephPrefix'>, req=True, name=prefix, n=1, numseen=0, prefix=orch),
              argdesc(<class 'ceph_argparse.CephPrefix'>, req=True, name=prefix, n=1, numseen=0, prefix=ls),
              argdesc(<class 'ceph_argparse.CephString'>, req=False, name=service_type, n=1, numseen=0),
              argdesc(<class 'ceph_argparse.CephString'>, req=False, name=service_name, n=1, numseen=0),
              argdesc(<class 'ceph_argparse.CephBool'>, req=False, name=export, n=1, numseen=0),
              argdesc(<class 'ceph_argparse.CephChoices'>, req=False, name=format, n=1, numseen=0, strings=plain|json|json-pretty|yaml|xml-pretty|xml),
              argdesc(<class 'ceph_argparse.CephBool'>, req=False, name=refresh, n=1, numseen=0)]}]
    Submitting command:  {'prefix': 'orch ls', 'target': ('mon-mgr', '')}
    submit {"prefix": "orch ls", "target": ["mon-mgr", ""]} to mon-mgr
    
    <hang>
    
    
    Debug output on the manager:
    
    debug 2022-07-22T23:27:12.509+0000 7fc180230700  0 log_channel(audit) log [DBG] : from='client.1084220 -' entity='client.admin' cmd=[{"prefix": "orch ls", "target": ["mon-mgr", ""]}]: dispatch
    
    I have collected a startup of the manager and uploaded it for review [3]
    
    
    Many Thanks,
    
    Tim
    
    
    [1] https://www.spinics.net/lists/ceph-users/msg68398.html
    [2] https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
    [3] https://pastebin.com/Dvb8sEbz
    
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux