Re: One OSD misbehaving (spinning 100% CPU, delayed ops)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mathhew,

anything special happening on the NIC side that could cause a problem? Packet drops? Incorrect jumbo frame settings causing fragmentation?

Have you checked the cstate settings on the box?

Have you disabled energy saving settings differently from the other boxes?

Any unexpected wait time on some devices on the box?

Have you compared your kernel parameters on this box compared to the other boxes?

Just in case
JC

On Nov 29, 2017, at 09:24, Matthew Vernon <mv3@xxxxxxxxxxxx> wrote:

Hi,

We have a 3,060 OSD ceph cluster (running Jewel
10.2.7-0ubuntu0.16.04.1), and one OSD on one host keeps misbehaving - by
which I mean it keeps spinning ~100% CPU (cf ~5% for other OSDs on that
host), and having ops blocking on it for some time. It will then behave
for a bit, and then go back to doing this.

It's always the same OSD, and we've tried replacing the underlying disk.

The logs have lots of entries of the form

2017-11-29 17:18:51.097230 7fcc06919700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fcc29fec700' had timed out after 15

I've had a brief poke through the collectd metrics for this osd (and
comparing them with other OSDs on the same host) but other than showing
spikes in latency for that OSD (iostat et al show no issues with the
underlying disk) there's nothing obviously explanatory.

I tried ceph tell osd.2054 injectargs --osd-op-thread-timeout 90 (which
is what googling for the above message suggests), but that just said
"unchangeable", and didn't seem to make any difference.

Any ideas? Other metrics to consider? ...

Thanks,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux