Re: osd become unusable, blocked by xfsaild (?) and load > 5000

Benedikt Fraunhofer <fraunhofer@xxxxxxxxxx> · Tue, 8 Dec 2015 08:57:19 +0100

Hi Jan,

> Doesn't look near the limit currently (but I suppose you rebooted it in the meantime?).

the box this numbers came from has an uptime of 13 days
so it's one of the boxes that did survive yesterdays half-cluster-wide-reboot.

> Did iostat say anything about the drives? (btw dm-1 and dm-6 are what? Is that your data drives?) - were they overloaded really?

no they didn't have any load and or iops.
Basically the whole box had nothing to do.

If I understand the load correctly, this just reports threads
that are ready and willing to work but - in this case -
don't get any data to work with.

Thx

 Benedikt

2015-12-08 8:44 GMT+01:00 Jan Schermer <jan@xxxxxxxxxxx>:
>
> Jan
>
>
>> On 08 Dec 2015, at 08:41, Benedikt Fraunhofer <fraunhofer@xxxxxxxxxx> wrote:
>>
>> Hi Jan,
>>
>> we had 65k for pid_max, which made
>> kernel.threads-max = 1030520.
>> or
>> kernel.threads-max = 256832
>> (looks like it depends on the number of cpus?)
>>
>> currently we've
>>
>> root@ceph1-store209:~# sysctl -a | grep -e thread -e pid
>> kernel.cad_pid = 1
>> kernel.core_uses_pid = 0
>> kernel.ns_last_pid = 60298
>> kernel.pid_max = 65535
>> kernel.threads-max = 256832
>> vm.nr_pdflush_threads = 0
>> root@ceph1-store209:~# ps axH |wc -l
>> 17548
>>
>> we'll see how it behaves once puppet has come by and adjusted it.
>>
>> Thx!
>>
>> Benedikt
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com