Re: Significantly increased CPU footprint on OSDs after Hammer -> Jewel upgrade, OSDs occasionally wrongly marked as down

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 26 Oct 2016 12:41:43 +0000 (UTC)

On Wed, 26 Oct 2016, Trygve Vea wrote:
> Hi,
> 
> We have two Ceph-clusters, one exposing pools both for RGW and RBD (OpenStack/KVM) pools - and one only for RBD.
> 
> After upgrading both to Jewel, we have seen a significantly increased CPU footprint on the OSDs that are a part of the cluster which includes RGW.
> 
> This graph illustrates this: http://i.imgur.com/Z81LW5Y.png

That looks pretty significant!

This doesn't ring any bells--I don't think it's something we've seen.  Can 
you do a 'perf top -p `pidof ceph-osd`' on one of the OSDs and grab a 
snapshot of the output?  It would be nice to compare to hammer but I 
expect you've long since upgraded all of the OSDs...

sage

> 
> 
> I wonder if anyone else have seen this behaviour, and if this is a symptom of a regression --- or if this was to be expected after moving from hammer to jewel.
> 
> I have also observed that an OSD will occasionally be marked as down, but will recover by itself.
> 
> This manifests itself in the osd logs as a series of lines along this:
> 
> 2016-10-26 06:32:20.106602 7fa57a942700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fa575938700' had timed out after 15
> 
> Some slow requests may be observed:
> 
> 2016-10-26 06:32:35.899597 7fa5aa41b700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.905777 secs
> 2016-10-26 06:32:35.899605 7fa5aa41b700  0 log_channel(cluster) log [WRN] : slow request 30.905777 seconds old, received at 2016-10-26 06:32:04.993791: replica scrub(pg: 3.2e,from:0'0,to:27810'772752,epoch:28538,start:3:74000000::::head,end:3:7400039b::::0,chunky:1,deep:1,seed:4294967295,version:6) currently reached_pg
> 
> Some failing heartbeat_checks (usually only from a single osd):
> 
> 2016-10-26 06:32:39.323412 7fa56f92c700 -1 osd.19 28538 heartbeat_check: no reply from osd.15 since back 2016-10-26 06:32:19.017249 front 2016-10-26 06:32:19.017249 (cutoff 2016-10-26 06:32:19.323409)
> 
> 
> A bunch of these (with the remote address targetting different osds):
> 
> 2016-10-26 06:32:45.522391 7fa598ec0700  0 -- 169.254.169.254:6812/151031797 >> 169.254.169.255:6802/41700 pipe(0x7fa5ebba7400 sd=160 :6812 s=2 pgs=4298 cs=1 l=0 c=0x7fa5d7c26400).fault with nothing to send, going to standby
> 
> 2016-10-26 06:32:45.525524 7fa5a5158700  0 log_channel(cluster) log [WRN] : map e28540 wrongly marked me down
> 
> Followed by repeering, and then everything is fine again.
> 
> 
> 
> I wonder if anyone have been suffering from similar behaviour, if this is a bug (known or unknown).  One detail to keep in mind is that the osds for the rgw pools store replicas on different physical sites.  However, we have no reason to believe that saturation or high latency is a problem.
> 
> 
> 
> Regards
> -- 
> Trygve Vea
> Redpill Linpro AS
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com