On Fri, Apr 9, 2021 at 4:04 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > Here's what you should look for, with debug_mon=10. It shows clearly > that it takes the mon 23 seconds to run through > get_removed_snaps_range. > So if this is happening every 30s, it explains at least part of why > this mon is busy. > > 2021-04-09 17:07:27.238 7f9fc83e4700 10 mon.sun-storemon01@0(leader) > e45 handle_subscribe > mon_subscribe({mdsmap=3914079+,monmap=0+,osdmap=1170448}) > 2021-04-09 17:07:27.238 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 check_osdmap_sub > 0x55e2e2133de0 next 1170448 (onetime) > 2021-04-09 17:07:27.238 7f9fc83e4700 5 > mon.sun-storemon01@0(leader).osd e1987355 send_incremental > [1170448..1987355] to client.131831153 > 2021-04-09 17:07:28.590 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 0 > [1~3] > 2021-04-09 17:07:29.898 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 5 [] > 2021-04-09 17:07:31.258 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 6 [] > 2021-04-09 17:07:32.562 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 20 > [] > 2021-04-09 17:07:33.866 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 21 > [] > 2021-04-09 17:07:35.162 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 22 > [] > 2021-04-09 17:07:36.470 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 23 > [] > 2021-04-09 17:07:37.778 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 24 > [] > 2021-04-09 17:07:39.090 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 25 > [] > 2021-04-09 17:07:40.398 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 26 > [] > 2021-04-09 17:07:41.706 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 27 > [] > 2021-04-09 17:07:43.006 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 28 > [] > 2021-04-09 17:07:44.322 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 29 > [] > 2021-04-09 17:07:45.630 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 30 > [] > 2021-04-09 17:07:46.938 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 31 > [] > 2021-04-09 17:07:48.246 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 32 > [] > 2021-04-09 17:07:49.562 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 34 > [] > 2021-04-09 17:07:50.862 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 35 > [] > 2021-04-09 17:07:50.862 7f9fc83e4700 20 > mon.sun-storemon01@0(leader).osd e1987355 send_incremental starting > with base full 1986745 664086 bytes > 2021-04-09 17:07:50.862 7f9fc83e4700 10 > mon.sun-storemon01@0(leader).osd e1987355 build_incremental > [1986746..1986785] with features 107b84a842aca > > So have a look for that client again or other similar traces. So, even though I blacklisted the client and we remounted the file system on it, it wasn't enough for it to keep performing the same bad requests. We found another node that had two sessions to the same mount point. We rebooted both nodes and the CPU is now back at a reasonable 4-6% and the cluster is running at full performance again. I've added in back both MONs to have all 3 mons in the system and there are no more elections. Thank you for helping us track down the bad clients out of over 2,000 clients. > > Maybe if that code path isn't needed in Nautilus it can be removed in > > the next point release? > > I think there were other major changes in this area that might make > such a backport difficult. And we should expect nautilus to be nearing > its end... But ... we just got to Nautilus... :) Thank you, Robert LeBlanc _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx