Re: [ceph-users] ceph-mon using 100% CPU after upgrade to 14.2.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sasha,

I was able to get past it by restarting the ceph-mon processes every time it got stuck, but that's not a very good solution for a production cluster.

Right now I'm trying to narrow down what is causing the problem.  Rebuilding the OSDs with BlueStore doesn't seem to be enough.  I believe it could be related to us using the extra space on the journal device as an SSD-based OSD.  During the conversion process I'm removing this SSD-based OSD (since with BlueStore the omap data is ending up on the SSD anyways), and I'm suspecting it might be causing this problem.

Bryan

On Dec 14, 2019, at 10:27 AM, Sasha Litvak <alexander.v.litvak@xxxxxxxxx> wrote:

Notice: This email is from an external sender. 
 
Bryan,

Were you able to resolve this?  If yes, can you please share with the list?

On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
Adding the dev list since it seems like a bug in 14.2.5.

I was able to capture the output from perf top:

  21.58%  libceph-common.so.0               [.] ceph::buffer::v14_2_0::list::append
  20.90%  libstdc++.so.6.0.19               [.] std::getline<char, std::char_traits<char>, std::allocator<char> >
  13.25%  libceph-common.so.0               [.] ceph::buffer::v14_2_0::list::append
  10.11%  libstdc++.so.6.0.19               [.] std::istream::sentry::sentry
   8.94%  libstdc++.so.6.0.19               [.] std::basic_ios<char, std::char_traits<char> >::clear
   3.24%  libceph-common.so.0               [.] ceph::buffer::v14_2_0::ptr::unused_tail_length
   1.69%  libceph-common.so.0               [.] std::getline<char, std::char_traits<char>, std::allocator<char> >@plt
   1.63%  libstdc++.so.6.0.19               [.] std::istream::sentry::sentry@plt
   1.21%  [kernel]                          [k] __do_softirq
   0.77%  libpython2.7.so.1.0               [.] PyEval_EvalFrameEx
   0.55%  [kernel]                          [k] _raw_spin_unlock_irqrestore

I increased mon debugging to 20 and nothing stuck out to me.

Bryan

> On Dec 12, 2019, at 4:46 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
> 
> On our test cluster after upgrading to 14.2.5 I'm having problems with the mons pegging a CPU core while moving data around.  I'm currently converting the OSDs from FileStore to BlueStore by marking the OSDs out in multiple nodes, destroying the OSDs, and then recreating them with ceph-volume lvm batch.  This seems too get the ceph-mon process into a state where it pegs a CPU core on one of the mons:
> 
> 1764450 ceph      20   0 4802412   2.1g  16980 S 100.0 28.1   4:54.72 ceph-mon
> 
> Has anyone else run into this with 14.2.5 yet?  I didn't see this problem while the cluster was running 14.2.4.
> 
> Thanks,
> Bryan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux