Re: Nautilus 14.2.19 mon 100% CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 9, 2021 at 4:04 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> Here's what you should look for, with debug_mon=10. It shows clearly
> that it takes the mon 23 seconds to run through
> get_removed_snaps_range.
> So if this is happening every 30s, it explains at least part of why
> this mon is busy.
>
> 2021-04-09 17:07:27.238 7f9fc83e4700 10 mon.sun-storemon01@0(leader)
> e45 handle_subscribe
> mon_subscribe({mdsmap=3914079+,monmap=0+,osdmap=1170448})
> 2021-04-09 17:07:27.238 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 check_osdmap_sub
> 0x55e2e2133de0 next 1170448 (onetime)
> 2021-04-09 17:07:27.238 7f9fc83e4700  5
> mon.sun-storemon01@0(leader).osd e1987355 send_incremental
> [1170448..1987355] to client.131831153
> 2021-04-09 17:07:28.590 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 0
> [1~3]
> 2021-04-09 17:07:29.898 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 5 []
> 2021-04-09 17:07:31.258 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 6 []
> 2021-04-09 17:07:32.562 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 20
> []
> 2021-04-09 17:07:33.866 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 21
> []
> 2021-04-09 17:07:35.162 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 22
> []
> 2021-04-09 17:07:36.470 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 23
> []
> 2021-04-09 17:07:37.778 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 24
> []
> 2021-04-09 17:07:39.090 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 25
> []
> 2021-04-09 17:07:40.398 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 26
> []
> 2021-04-09 17:07:41.706 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 27
> []
> 2021-04-09 17:07:43.006 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 28
> []
> 2021-04-09 17:07:44.322 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 29
> []
> 2021-04-09 17:07:45.630 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 30
> []
> 2021-04-09 17:07:46.938 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 31
> []
> 2021-04-09 17:07:48.246 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 32
> []
> 2021-04-09 17:07:49.562 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 34
> []
> 2021-04-09 17:07:50.862 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 35
> []
> 2021-04-09 17:07:50.862 7f9fc83e4700 20
> mon.sun-storemon01@0(leader).osd e1987355 send_incremental starting
> with base full 1986745 664086 bytes
> 2021-04-09 17:07:50.862 7f9fc83e4700 10
> mon.sun-storemon01@0(leader).osd e1987355 build_incremental
> [1986746..1986785] with features 107b84a842aca
>
> So have a look for that client again or other similar traces.

So, even though I blacklisted the client and we remounted the file
system on it, it wasn't enough for it to keep performing the same bad
requests. We found another node that had two sessions to the same
mount point. We rebooted both nodes and the CPU is now back at a
reasonable 4-6% and the cluster is running at full performance again.
I've added in back both MONs to have all 3 mons in the system and
there are no more elections. Thank you for helping us track down the
bad clients out of over 2,000 clients.

> > Maybe if that code path isn't needed in Nautilus it can be removed in
> > the next point release?
>
> I think there were other major changes in this area that might make
> such a backport difficult. And we should expect nautilus to be nearing
> its end...

But ... we just got to Nautilus... :)

Thank you,
Robert LeBlanc
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux