On Mon, Dec 7, 2015 at 6:59 AM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote: > Hi cephers, > after one OSD node crash (6 OSDs in total), we experienced an increase > of approximately 230-260 threads for every other OSD node. We have 26 > OSD nodes with 6 OSDs per node, so this is approximately 40 threads > per osd. The OSD node has joined the cluster after 15-20 minutes. > > The only workaround I have found so far is to restart the OSDs of the > cluster, but this is a quite heavy operation. Could you help me > understand if the behaviour described above is an expected one and > what could be the reason for this? Does ceph cleanup appropriately osd > processes threads? > > Extra info: all threads are in sleeping state right now and context > switches have been stabilized at the pre-crash levels Can you describe exactly what you observed with time intervals? Eg: did the OSDs get restarted after crashing, and how did the thread counts relate to that. Did anything else happen in the cluster while this was happening. How long did you wait before you began restarting OSDs to reduce the thread counts. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com