Re: osd process threads stack up on osds failure

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 7 Dec 2015 07:08:31 -0800



On Mon, Dec 7, 2015 at 6:59 AM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote:
> Hi cephers,
> after one OSD node crash (6 OSDs in total), we experienced an increase
> of approximately 230-260 threads for every other OSD node. We have 26
> OSD nodes with 6 OSDs per node, so this is approximately 40 threads
> per osd. The OSD node has joined the cluster after 15-20 minutes.
>
> The only workaround I have found so far is to restart the OSDs of the
> cluster, but this is a quite heavy operation. Could you help me
> understand if the behaviour described above is an expected one and
> what could be the reason for this? Does ceph cleanup appropriately osd
> processes threads?
>
> Extra info: all threads are in sleeping state right now and context
> switches have been stabilized at the pre-crash levels

Can you describe exactly what you observed with time intervals? Eg:
did the OSDs get restarted after crashing, and how did the thread
counts relate to that. Did anything else happen in the cluster while
this was happening. How long did you wait before you began restarting
OSDs to reduce the thread counts.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com