Re: Nautilus 14.2.19 mon 100% CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 9, 2021 at 2:04 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> On Fri, Apr 9, 2021 at 9:37 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > > >
> > > > Thanks. I didn't see anything ultra obvious to me.
> > > >
> > > > But I did notice the nearfull warnings so I wonder if this cluster is
> > > > churning through osdmaps? Did you see a large increase in inbound or
> > > > outbound network traffic on this mon following the upgrade?
> > > > Totally speculating here, but maybe there is an issue where you have
> > > > some old clients, which can't decode an incremental osdmap from a
> > > > nautilus mon, so the single mon is busy serving up these maps to the
> > > > clients.
> > > >
> > > > Does the mon load decrease if you stop the osdmap churn?, e.g. by
> > > > setting norebalance if that is indeed ongoing.
> > > >
> > > > Could you also share debug_ms = 1 for a minute of busy cpu mon?
> > >
> > > Here are the new logs with the debug_ms=1 for a bit.
> > > https://owncloud.leblancnet.us/owncloud/index.php/s/1hvtJo3s2oLPpWn
> >
> > Something strange in this is there is one hammer client that is asking
> > for nearly a million incremental osdmaps, seemingly every 30s:
> >
> >     client.131831153 at 172.16.212.55 is asking for incrementals from
> > 1170448..1987355 (see [1])
> >
> > Can you try to evict/kill/block that client and see if your mon load drops?
> >
>
> Before you respond, just noting here ftr that i think there's a
> possible issue with OSDMonitor::get_removed_snaps_range and clients
> like this.
>
>     https://github.com/ceph/ceph/blob/v14.2.19/src/mon/OSDMonitor.cc#L4193
>
> Called by send_incremental:
>
>     https://github.com/ceph/ceph/blob/v14.2.19/src/mon/OSDMonitor.cc#L4152
>
> When building the incremental it will search the mon's rocksdb for
> removed snaps across those ~million missing maps.
>
> That feature seems removed from octopus onward.

I evicted that client and CPU hasn't gone down significantly. There
may be other clients also causing the issue. Was it the
`osdmap=1170448` part of the line that says how many OSDmaps it's
trying to get? I can look for others in the logs and evict them as
well.

Maybe if that code path isn't needed in Nautilus it can be removed in
the next point release?

Thank you,
Robert LeBlanc
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux