Re: Significantly increased CPU footprint on OSDs after Hammer -> Jewel upgrade, OSDs occasionally wrongly marked as down

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 26 Oct 2016 14:37:03 +0000 (UTC)

On Wed, 26 Oct 2016, Trygve Vea wrote:
> ----- Den 26.okt.2016 14:41 skrev Sage Weil sage@xxxxxxxxxxxx:
> > On Wed, 26 Oct 2016, Trygve Vea wrote:
> >> Hi,
> >> 
> >> We have two Ceph-clusters, one exposing pools both for RGW and RBD
> >> (OpenStack/KVM) pools - and one only for RBD.
> >> 
> >> After upgrading both to Jewel, we have seen a significantly increased CPU
> >> footprint on the OSDs that are a part of the cluster which includes RGW.
> >> 
> >> This graph illustrates this: http://i.imgur.com/Z81LW5Y.png
> > 
> > That looks pretty significant!
> > 
> > This doesn't ring any bells--I don't think it's something we've seen.  Can
> > you do a 'perf top -p `pidof ceph-osd`' on one of the OSDs and grab a
> > snapshot of the output?  It would be nice to compare to hammer but I
> > expect you've long since upgraded all of the OSDs...
> 
> # perf record -p 18001
> ^C[ perf record: Woken up 57 times to write data ]
> [ perf record: Captured and wrote 18.239 MB perf.data (408850 samples) ]
> 
> 
> This is a screenshot of one of the osds during high utilization: http://i.imgur.com/031MyIJ.png

It looks like a ton of time spent in std::string methods and a lot more 
map<sring,ghobject_t> than I would expect.  Can you do a 

 perf record -p `pidof ceph-osd` -g
 perf report --stdout

> Link to download binary format sent directly to you.
> 
> 
> Your expectation about upgrades is correct.  We actually had some 
> problems performing the upgrade, so we ended up re-initializing the osds 
> as empty and backfill into jewel.  When we first started them on jewel, 
> they ended up blocking

Hrm, this is a new one for me too.  They've all been upgraded now?  It 
would be nice to see a log or backtrace to see why they got stuck.

Thanks!
sage

> I want to add that the resource usage isn't flat - this is a day graph 
> of one of the osd servers: http://i.imgur.com/MLfoVgE.png
> 
> 
> 
> Regards
> -- 
> Trygve Vea
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com