Re: One OSD fails (slow requests, high cpu, termination)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

> I just noticed a strange behavior on one OSD (and only one, other OSDs on the same server didn’t show that behavior) in a ceph-cluster (all 0.94.2 on Debian 7 with a self-made 4.1 Kernel).
> The OSD started to accumulate slow requests, a restart didn’t help.
> 
> After a few seconds the log is filled with lines like these:
>   -91> 2015-07-20 21:55:03.537385 7f9e20ec3700  0 -- [<OwnIPv6>]:6814/1376041 >> [<OwnIPv6>]:0/2078381 pipe(0x5396f000 sd=16371 :6814 s=0 pgs=0 cs=0 l=1 c=0x538e7340).accept replacing existing (lossy) channel (new one lossy=1)
> (Full example after startup https://paste.ee/p/HfTlp )
> 
> With nearly 100% CPU usage.
> 
> After some time the slow requests accumulate so I restart the OSD, if I wait longer I observed a termination at the end (longer version: https://paste.ee/p/XvD0o ):

a short followup: The problem disappeared after some restarts. So The cluster ist healthy again and I’m mildly irritated.
But that also means, I can’t provide more debug-Information.

greetings

Johannes
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux