Re: Command ceph osd df hangs

Thomas Schneider <74cmonty@xxxxxxxxx> · Fri, 22 Nov 2019 08:29:21 +0100

Hi,

issue solved!

I have stopped active MGR service and waited until standby MGR became
active.
Then I started the (previously stopped) MGR service in order to have 2
standby.

Thanks Eugen

Am 21.11.2019 um 15:23 schrieb Eugen Block:
> Hi,
>
> check if the active MGR is hanging.
> I had this when testing pg_autoscaler, after some time every command
> would hang. Restarting the MGR helped for a short period of time, then
> I disabled pg_autoscaler. This is an upgraded cluster, currently on
> Nautilus.
>
> Regards,
> Eugen
>
>
> Zitat von Thomas Schneider <74cmonty@xxxxxxxxx>:
>
>> Hi,
>> command ceph osd df does not return any output.
>> Based on the strace output there's a timeout.
>> [...]
>> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x7f53006b9000
>> brk(0x55c2579b6000)                     = 0x55c2579b6000
>> brk(0x55c2579d7000)                     = 0x55c2579d7000
>> brk(0x55c2579f9000)                     = 0x55c2579f9000
>> brk(0x55c257a1a000)                     = 0x55c257a1a000
>> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x7f5300679000
>> brk(0x55c257a3b000)                     = 0x55c257a3b000
>> brk(0x55c257a5c000)                     = 0x55c257a5c000
>> brk(0x55c257a7d000)                     = 0x55c257a7d000
>> clone(child_stack=0x7f53095c1fb0,
>> flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
>>
>> parent_tidptr=0x7f53095c29d0, tls=0x7f53095c2700,
>> child_tidptr=0x7f53095c29d0) = 3261669
>> futex(0x55c257489940, FUTEX_WAKE_PRIVATE, 1) = 1
>> futex(0x55c2576246e0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0,
>> NULL, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily
>> unavailable)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=2000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=4000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=8000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=16000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=32000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000}) = 0 (Timeout)
>> select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000}^Cstrace: Process
>> 3261645 detached
>>  <detached ...>
>> Interrupted
>> Traceback (most recent call last):
>>   File "/usr/bin/ceph", line 1263, in <module>
>>     retval = main()
>>   File "/usr/bin/ceph", line 1194, in main
>>
>>     verbose)
>>   File "/usr/bin/ceph", line 619, in new_style_command
>>     ret, outbuf, outs = do_command(parsed_args, target, cmdargs,
>> sigdict, inbuf, verbose)
>>   File "/usr/bin/ceph", line 593, in do_command
>>     return ret, '', ''
>> UnboundLocalError: local variable 'ret' referenced before assignment
>>
>>
>> How can I fix this?
>> Do you need the full strace output to analyse this issue?
>>
>> This Ceph health status is reported since hours and I cannot identify
>> any progress. Not sure if this is related to the issue with ceph osd df,
>> though.
>>
>> 2019-11-21 15:00:00.000262 mon.ld5505 [ERR] overall HEALTH_ERR 1
>> filesystem is degraded; 1 filesystem has a failed mds daemon; 1
>> filesystem is offline; insufficient standby MDS daemons available;
>> nodown,noout,noscrub,nodeep-scrub flag(s) set; 81 osds down; Reduced
>> data availability: 1366 pgs inactive, 241 pgs peering; Degraded data
>> redundancy: 6437/190964568 objects degraded (0.003%), 7 pgs degraded, 7
>> pgs undersized; 1 subtrees have overcommitted pool target_size_bytes; 1
>> subtrees have overcommitted pool target_size_ratio
>>
>> THX
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com