If an upmap is not stored, it means that OSDMap::check_pg_upmaps is deciding that those upmaps are invalid for some reason. Additional debugging can help sort out why. (Maybe you have a complex crush tree and the balancer is creating invalid upmaps). -- dan On Fri, May 1, 2020 at 2:48 PM Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > > Also just a follow-up on the misbehavior of ceph-mgr. It looks like the > upmap balancer is not acting reasonably either. It is trying to create > upmap entries every minute or so - and claims to be successful, but they > never show up in the OSD map. Setting the logging to 'debug', I see > upmap entries created such as: > > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > pg-upmap-items 9.60c4 mappings [{'to': 3313L, 'from': 3371L}] > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > pg-upmap-items 9.632b mappings [{'to': 2187L, 'from': 1477L}] > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > pg-upmap-items 9.6b9c mappings [{'to': 3315L, 'from': 3371L}] > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > pg-upmap-items 9.6bf6 mappings [{'to': 1581L, 'from': 1477L}] > 2020-05-01 08:43:07.909 7fffca074700 4 mgr[balancer] ceph osd > pg-upmap-items 9.7da4 mappings [{'to': 2419L, 'from': 2537L}] > ... > 2020-05-01 08:43:07.909 7fffca074700 20 mgr[balancer] commands > [<mgr_module.CommandResult object at 0x7fffcc990550>, > <mgr_module.CommandResult object at 0x7fffcc990fd0>, > <mgr_module.CommandResult object at 0x7fffcc9907d0>, <mgr_module.Com > mandResult object at 0x7fffcc990650>, <mgr_module.CommandResult object > at 0x7fffcc990610>, <mgr_module.CommandResult object at 0x7fffcc990f50>, > <mgr_module.CommandResult object at 0x7fffcc990bd0>, > <mgr_module.CommandResult object at 0x7ff > fcc990d90>, <mgr_module.CommandResult object at 0x7fffcc990ad0>, > <mgr_module.CommandResult object at 0x7fffcc990410>, > <mgr_module.CommandResult object at 0x7fffbed241d0>, > <mgr_module.CommandResult object at 0x7fff6a6caf90>, <mgr_module.Co > mmandResult object at 0x7fffbed242d0>, <mgr_module.CommandResult object > at 0x7fffbed24d90>, <mgr_module.CommandResult object at 0x7fffbed24d50>, > <mgr_module.CommandResult object at 0x7fffbed24550>, > <mgr_module.CommandResult object at 0x7f > ffbed245d0>, <mgr_module.CommandResult object at 0x7fffbed24510>, > <mgr_module.CommandResult object at 0x7fffbed24690>, > <mgr_module.CommandResult object at 0x7fffbed24990>] > ... > 2020-05-01 08:43:16.733 7fffca074700 20 mgr[balancer] done > ... > > but these mappings do not show up in the osd dump. And a minute later, > the balancer tries again and comes up with a set of very similar > mappings (same from and to OSDs, slightly different PG numbers) - and > keeps going like that every minute without any progress (the set of > upmap entries stays the same, does not increase). > > Andras > > > On 5/1/20 8:12 AM, Andras Pataki wrote: > > I'm wondering if anyone still sees issues with ceph-mgr using CPU and > > being unresponsive even in recent Nautilus releases. We upgraded our > > largest cluster from Mimic to Nautilus (14.2.8) recently - it has > > about 3500 OSDs. Now ceph-mgr is constantly at 100-200% CPU (1-2 > > cores), and becomes unresponsive after a few minutes. The > > finisher-Mgr queue length grows (I've seen it at over 100k) - similar > > symptoms as seen with earlier Nautilus releases by many. This is what > > it looks like after an hour of running: > > > > "finisher-Mgr": { > > "queue_len": 66078, > > "complete_latency": { > > "avgcount": 21, > > "sum": 2098.408767721, > > "avgtime": 99.924227034 > > } > > }, > > > > We have a pretty vanilla manager config, only the balancer is enabled > > in upmap mode. Here are the enabled modules: > > > > "always_on_modules": [ > > "balancer", > > "crash", > > "devicehealth", > > "orchestrator_cli", > > "progress", > > "rbd_support", > > "status", > > "volumes" > > ], > > "enabled_modules": [ > > "restful" > > ], > > > > Any ideas or outstanding issues in this area? > > > > Andras > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx