Re: [nautilus] ceph tell hanging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I started now to iterate over all osds in the tree and some of the osds
are completely unresponsive:

[18:27:18] black1.place6:~# for osd in $(ceph osd tree | grep osd. | awk '{ print $4 }'); do echo $osd;  ceph tell $osd injectargs '--osd-max-backfills 1'; done
osd.20
osd.56
osd.62
osd.63

^CTraceback (most recent call last):
  File "/usr/bin/ceph", line 1266, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1182, in main
    prefix='get_command_descriptions')
  File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1459, in json_command
    inbuf, timeout, verbose)
  File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1329, in send_command_retry
    return send_command(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1361, in send_command
    cluster.osd_command, osdid, cmd, inbuf, timeout=timeout)
  File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1311, in run_in_thread
    t.join(timeout=timeout)
  File "/usr/lib/python3.7/threading.py", line 1036, in join
    self._wait_for_tstate_lock(timeout=max(timeout, 0))
  File "/usr/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
osd.64
osd.65

What's the best way to figure out why osd.63 does not react to the tell
command?

Best regards,

Nico


Nico Schottelius <nico.schottelius@xxxxxxxxxxx> writes:

> Hello Stefan,
>
> Stefan Kooman <stefan@xxxxxx> writes:
>
>> Hi,
>>
>>> However as soon as we issue either of the above tell commands, it just
>>> hangs. Furthermore when ceph tell hangs, pg are also becoming stuck in
>>> "Activating" and "Peering" states.
>>>
>>> It seems to be related, as soon as we stop ceph tell (ctrl-c it), a few
>>> minutes later the pgs are peered/active.
>>>
>>> We can reproduce this problem also with very busy osds, which have been
>>> moved to another host - they also do not react to the ceph tell commands.
>>
>> Does this also happen when you issue a osd specific "tell", i.e. ceph
>> tell osd.13 injectargs '--osd-max-backfills 4'
>>
>> Does this also happen when you loop over it one by one?
>
> It does hang for some of them, but if I "ping" / select specific OSDs,
> this does not happen.
>
>>> Did anyone see this before and/or do you have a hint on how to debug
>>> ceph tell as it is not a daemon on its own?
>>
>> IIRC I have seen this, but not in combination with PGs peering /
>> activating. Has the config change become effective on alls OSDs: verify
>> with  ceph daemon osd.13 config get osd_max_backfills (for all OSDs)
>
> Just checked - most OSDs did not apply the new setting, setting it
> explicitly on them works however.
>
> Best regards,
>
> Nico


--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux