Could you create a tracker for this and attach an osdmap as well as some recent balancer output (perhaps at a higher debug level if possible)? There are some improvements awaiting backport to nautilus for the C++/python interface just FYI [0] You might also look at gathering output using something like [1] to try to narrow down further what is causing the high CPU consumption. [0] https://github.com/ceph/ceph/pull/34356 [1] https://github.com/markhpc/gdbpmp On Fri, May 8, 2020 at 1:10 AM Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi everyone, > > After some investigation, it looks like on our large cluster, ceph-mgr > is not able to keep up with the status updates from about 3500 OSDs. By > default OSDs send updates to ceph-mgr every 5 seconds, which, in our > case, turns to about 700 messages/s to ceph-mgr. It looks from gdb > traces that ceph-mgr runs some python code for each of them - so 700 > python snipets/s might be too much. Increasing mgr_stats_period to 15 > seconds reduces the load and brings ceph-mgr back to responsive again. > Unfortunately this isn't sustainable since if we were to expand the > cluster, we'd need to further reduce the update frequency from OSDs. > > I also checked our other clusters and they have about proportionately > lower load on ceph-mgr based on their OSD counts. > > Any thoughts about the scalability of ceph-mgr to a large number of > OSDs? We recently upgraded this cluster from Mimic, where we didn't see > this issue. > > Andras > > On 5/1/20 8:48 AM, Andras Pataki wrote: > > Also just a follow-up on the misbehavior of ceph-mgr. It looks like > > the upmap balancer is not acting reasonably either. It is trying to > > create upmap entries every minute or so - and claims to be successful, > > but they never show up in the OSD map. Setting the logging to > > 'debug', I see upmap entries created such as: > > > > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > > pg-upmap-items 9.60c4 mappings [{'to': 3313L, 'from': 3371L}] > > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > > pg-upmap-items 9.632b mappings [{'to': 2187L, 'from': 1477L}] > > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > > pg-upmap-items 9.6b9c mappings [{'to': 3315L, 'from': 3371L}] > > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > > pg-upmap-items 9.6bf6 mappings [{'to': 1581L, 'from': 1477L}] > > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > > pg-upmap-items 9.7da4 mappings [{'to': 2419L, 'from': 2537L}] > > ... > > 2020-05-01 08:43:07.909 7fffca074700 20 mgr[balancer] commands > > [<mgr_module.CommandResult object at 0x7fffcc990550>, > > <mgr_module.CommandResult object at 0x7fffcc990fd0>, > > <mgr_module.CommandResult object at 0x7fffcc9907d0>, <mgr_module.Com > > mandResult object at 0x7fffcc990650>, <mgr_module.CommandResult object > > at 0x7fffcc990610>, <mgr_module.CommandResult object at > > 0x7fffcc990f50>, <mgr_module.CommandResult object at 0x7fffcc990bd0>, > > <mgr_module.CommandResult object at 0x7ff > > fcc990d90>, <mgr_module.CommandResult object at 0x7fffcc990ad0>, > > <mgr_module.CommandResult object at 0x7fffcc990410>, > > <mgr_module.CommandResult object at 0x7fffbed241d0>, > > <mgr_module.CommandResult object at 0x7fff6a6caf90>, <mgr_module.Co > > mmandResult object at 0x7fffbed242d0>, <mgr_module.CommandResult > > object at 0x7fffbed24d90>, <mgr_module.CommandResult object at > > 0x7fffbed24d50>, <mgr_module.CommandResult object at 0x7fffbed24550>, > > <mgr_module.CommandResult object at 0x7f > > ffbed245d0>, <mgr_module.CommandResult object at 0x7fffbed24510>, > > <mgr_module.CommandResult object at 0x7fffbed24690>, > > <mgr_module.CommandResult object at 0x7fffbed24990>] > > ... > > 2020-05-01 08:43:16.733 7fffca074700 20 mgr[balancer] done > > ... > > > > but these mappings do not show up in the osd dump. And a minute > > later, the balancer tries again and comes up with a set of very > > similar mappings (same from and to OSDs, slightly different PG > > numbers) - and keeps going like that every minute without any progress > > (the set of upmap entries stays the same, does not increase). > > > > Andras > > > > > > On 5/1/20 8:12 AM, Andras Pataki wrote: > >> I'm wondering if anyone still sees issues with ceph-mgr using CPU and > >> being unresponsive even in recent Nautilus releases. We upgraded our > >> largest cluster from Mimic to Nautilus (14.2.8) recently - it has > >> about 3500 OSDs. Now ceph-mgr is constantly at 100-200% CPU (1-2 > >> cores), and becomes unresponsive after a few minutes. The > >> finisher-Mgr queue length grows (I've seen it at over 100k) - similar > >> symptoms as seen with earlier Nautilus releases by many. This is what > >> it looks like after an hour of running: > >> > >> "finisher-Mgr": { > >> "queue_len": 66078, > >> "complete_latency": { > >> "avgcount": 21, > >> "sum": 2098.408767721, > >> "avgtime": 99.924227034 > >> } > >> }, > >> > >> We have a pretty vanilla manager config, only the balancer is enabled > >> in upmap mode. Here are the enabled modules: > >> > >> "always_on_modules": [ > >> "balancer", > >> "crash", > >> "devicehealth", > >> "orchestrator_cli", > >> "progress", > >> "rbd_support", > >> "status", > >> "volumes" > >> ], > >> "enabled_modules": [ > >> "restful" > >> ], > >> > >> Any ideas or outstanding issues in this area? > >> > >> Andras > >> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Cheers, Brad _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx