Re: OSDs thread leak during degraded cluster state

Kostis Fardelas <dante1234@xxxxxxxxx> · Thu, 15 Sep 2016 16:32:03 +0300

Our ceph cluster (from emperor till hammer) has made many times
recoveries during host outages/network failures and threads never
exceeded 10K. The thread leaks we experienced with down+peering PGs
(lasting for several hours) was something that we saw for the first
time. I don't see the reason to bump this. It looks like a leak (and
of course I could extend the leak by bumping pid_max) but this is not
the case, isn't it?

Kostis

On 15 September 2016 at 14:40, Wido den Hollander <wido@xxxxxxxx> wrote:
>
>> Op 15 september 2016 om 13:27 schreef Kostis Fardelas <dante1234@xxxxxxxxx>:
>>
>>
>> Hello cephers,
>> being in a degraded cluster state with 6/162 OSDs down ((Hammer
>> 0.94.7, 162 OSDs, 27 "fat" nodes, 1000s of clients) ) like the below
>> ceph cluster log indicates:
>>
>> 2016-09-12 06:26:08.443152 mon.0 62.217.119.14:6789/0 217309 : cluster
>> [INF] pgmap v106027148: 28672 pgs: 2 down+remapped+peering, 25904
>> active+clean, 23 stale+down+peering, 1 active+recovery_wait+degraded,
>> 1 active+recovery_wait+undersized+degraded, 170 down+peering, 1
>> active+clean+scrubbing, 8
>> active+undersized+degraded+remapped+wait_backfill, 27
>> stale+active+undersized+degraded, 3 active+remapped+wait_backfill,
>> 2531 active+undersized+degraded, 1
>> active+recovering+undersized+degraded+remapped; 95835 GB data, 186 TB
>> used, 94341 GB / 278 TB avail; 11230 B/s rd, 164 kB/s wr, 42 op/s;
>> 3148226/69530815 objects degraded (4.528%); 59272/69530815 objects
>> misplaced (0.085%); 1/34756893 unfound (0.000%)
>>
>> we experienced extensive thread leaks on the remaining up+in OSDs,
>> which lead to even more random crashes with Thread::create asserts:
>>
>> 2016-09-10 09:08:40.211713 7f8576bd6700 -1 common/Thread.cc: In
>> function 'void Thread::create(size_t)' thread 7f8576bd6700 time
>> 2016-09-10 09:08:40.199211
>> common/Thread.cc: 131: FAILED assert(ret == 0)
>>
>> Thread count under normal operations are ~6500 on all nodes, but in
>> this degraded state we reached even ~35000.
>>
>> Is this expected behaviour when you have down+peering OSDs?
>> Is it possible to mitigate this problem using ceph configuration or
>> our only resort is kernel pid_max bump?
>>
>
> You should bump that setting. The default 32k is way to low during recovery.
>
> Set it to at least 512k or so.
>
> Wido
>
>> Regards,
>> Kostis
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com