This could be the pg autoscaler since you added new OSDs. You can run ceph osd pool ls detail and check the pg_num and pg_target numbers iirc to confirm On Sun, Feb 12, 2023 at 20:24 Chris Dunlop <chris@xxxxxxxxxxxx> wrote: > Hi, > > ceph-16.2.9 > > I've added some new osds - some added to existing hosts and some on > newly-commissioned hosts. The new osds were added to the data side of an > existing EC 8.3 pool. > > I've been waiting for the system to finish remapping / backfilling for > some time. Originally the number of remapped pgs and objects misplaced was > steadily decreasing. However in the last few days both remapped pgs and > objects misplaced have increased several times. Both then steadily > decrease before jumping up again. See below > for some details on this. > > I'm guessing these are associated stats, i.e. the number of objects > misplaced reflects the objects in the remapped pgs so it's not surprising > they're moving in concert. > > I thought at first that this cycling might be due to the balancer (which > was enabled), but the cycling is continuing after the balancer was > disabled, e.g.: > > $ ts; ceph balancer status > 2023-02-09 12:39:14 > { > "active": true, > "last_optimize_duration": "0:00:00.000313", > "last_optimize_started": "Thu Feb 9 01:38:55 2023", > "mode": "upmap", > "optimize_result": "Too many objects (0.078537 > 0.050000) are > misplaced; try again later", > "plans": [] > } > $ ts; ceph balancer off > 2023-02-09 13:03:27 ceph balancer off > > 4 days later, the balancer is still inactive, and "last_optimize_started" > indicates it hasn't run since before it was turned off: > > $ ts; ceph balancer status > 2023-02-13 10:18:18 > { > "active": false, > "last_optimize_duration": "0:00:00.000420", > "last_optimize_started": "Thu Feb 9 02:02:56 2023", > "mode": "upmap", > "optimize_result": "Too many objects (0.078101 > 0.050000) are > misplaced; try again later", > "plans": [] > } > > Since the balancer was turned off the cycling continues: "misplaced" gets > down to 5% and then rapidly increases to 9%, in conjunction with "starting > backfill" messages, before starting to slowly decrease once again. See > below for a log extract. > > Is this "sawtooth" pattern of remapped pgs and misplaced objects a normal > consequence of adding OSDs? > > Will it eventually settle done itself or do I need to "do something"? > > Is there some way of telling how much work remains until it all settles > down? > > Cheers, > > Chris > > > ------------------------------------------------------------------------------ > # > # General health > # scrubs disabled until added OSDs complete backfilling etc. > # > $ ceph -s > cluster: > id: c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 > health: HEALTH_WARN > noout,noscrub,nodeep-scrub flag(s) set > 5945 pgs not deep-scrubbed in time > 5945 pgs not scrubbed in time > 5 slow ops, oldest one blocked for 464 sec, mon.k2 has slow > ops > > services: > mon: 5 daemons, quorum k2,b2,b4,k1,b5 (age 5d) > mgr: b4(active, since 3M), standbys: b5, b2 > osd: 82 osds: 82 up (since 2w), 82 in (since 2w); 12 remapped pgs > flags noout,noscrub,nodeep-scrub > > data: > pools: 16 pools, 5945 pgs > objects: 70.35M objects, 88 TiB > usage: 186 TiB used, 242 TiB / 428 TiB avail > pgs: 50050192/700705500 objects misplaced (7.143%) > 5933 active+clean > 10 active+remapped+backfill_wait > 2 active+remapped+backfilling > > io: > client: 2.2 MiB/s rd, 1.8 MiB/s wr, 395 op/s rd, 122 op/s wr > recovery: 24 MiB/s, 24 objects/s > > # > # remapped pgs steadily decreasing then started periodically increasing > # (extract from periodic "ceph -s") > # > 2023-02-06 00:00:00 osd: 82 osds: 82 up (since 7d), 82 in (since 6d); > 26 remapped pgs > 2023-02-06 04:00:00 osd: 82 osds: 82 up (since 7d), 82 in (since 6d); > 25 remapped pgs > 2023-02-06 06:00:00 osd: 82 osds: 82 up (since 7d), 82 in (since 6d); > 24 remapped pgs > 2023-02-06 09:00:00 osd: 82 osds: 82 up (since 7d), 82 in (since 6d); > 23 remapped pgs > 2023-02-06 12:30:00 osd: 82 osds: 82 up (since 7d), 82 in (since 7d); > 22 remapped pgs > 2023-02-06 14:00:00 osd: 82 osds: 82 up (since 7d), 82 in (since 7d); > 21 remapped pgs > 2023-02-06 14:30:00 osd: 82 osds: 82 up (since 7d), 82 in (since 7d); > 20 remapped pgs > 2023-02-06 20:30:00 osd: 82 osds: 82 up (since 7d), 82 in (since 7d); > 19 remapped pgs > 2023-02-07 00:00:00 osd: 82 osds: 82 up (since 8d), 82 in (since 7d); > 18 remapped pgs > 2023-02-07 01:00:00 osd: 82 osds: 82 up (since 8d), 82 in (since 7d); > 17 remapped pgs > 2023-02-07 11:00:00 osd: 82 osds: 82 up (since 8d), 82 in (since 8d); > 15 remapped pgs > 2023-02-07 16:30:00 osd: 82 osds: 82 up (since 8d), 82 in (since 8d); > 13 remapped pgs > 2023-02-07 22:30:00 osd: 82 osds: 82 up (since 9d), 82 in (since 8d); > 11 remapped pgs > 2023-02-08 15:30:00 osd: 82 osds: 82 up (since 9d), 82 in (since 9d); 9 > remapped pgs > 2023-02-08 21:30:00 osd: 82 osds: 82 up (since 10d), 82 in (since 9d); > 15 remapped pgs <<< increase > 2023-02-09 07:30:00 osd: 82 osds: 82 up (since 10d), 82 in (since 9d); > 13 remapped pgs > 2023-02-09 13:03:27 ceph balancer off > 2023-02-09 22:30:00 osd: 82 osds: 82 up (since 11d), 82 in (since 10d); > 11 remapped pgs > 2023-02-10 14:00:00 osd: 82 osds: 82 up (since 11d), 82 in (since 11d); > 9 remapped pgs > 2023-02-10 19:00:00 osd: 82 osds: 82 up (since 11d), 82 in (since 11d); > 15 remapped pgs <<< increase > 2023-02-11 04:30:00 osd: 82 osds: 82 up (since 12d), 82 in (since 12d); > 13 remapped pgs > 2023-02-11 19:00:00 osd: 82 osds: 82 up (since 12d), 82 in (since 12d); > 11 remapped pgs > 2023-02-12 12:00:00 osd: 82 osds: 82 up (since 13d), 82 in (since 13d); > 9 remapped pgs > 2023-02-12 18:00:00 osd: 82 osds: 82 up (since 13d), 82 in (since 13d); > 15 remapped pgs <<< increase > 2023-02-12 20:30:00 osd: 82 osds: 82 up (since 13d), 82 in (since 13d); > 14 remapped pgs > 2023-02-13 04:00:00 osd: 82 osds: 82 up (since 2w), 82 in (since 13d); > 12 remapped pgs > > # > # objects misplaced increases when the remapped pgs increases > # > 2023-02-06 00:00:00 pgs: 62898818/685798373 objects misplaced > (9.172%) > 2023-02-06 04:00:00 pgs: 61995946/686058837 objects misplaced > (9.037%) > 2023-02-06 06:00:00 pgs: 61090940/686122303 objects misplaced > (8.904%) > 2023-02-06 09:00:00 pgs: 59285766/686306445 objects misplaced > (8.638%) > 2023-02-06 12:30:00 pgs: 57626520/686387322 objects misplaced > (8.396%) > 2023-02-06 14:00:00 pgs: 56813753/686388870 objects misplaced > (8.277%) > 2023-02-06 14:30:00 pgs: 56578343/686389185 objects misplaced > (8.243%) > 2023-02-06 20:30:00 pgs: 54255700/686969448 objects misplaced > (7.898%) > 2023-02-07 00:00:00 pgs: 53089526/687266865 objects misplaced > (7.725%) > 2023-02-07 01:00:00 pgs: 52810518/687391707 objects misplaced > (7.683%) > 2023-02-07 11:00:00 pgs: 50169973/688319189 objects misplaced > (7.289%) > 2023-02-07 16:30:00 pgs: 49312642/689410730 objects misplaced > (7.153%) > 2023-02-07 22:30:00 pgs: 48244798/689771838 objects misplaced > (6.994%) > 2023-02-08 15:30:00 pgs: 38058106/691166947 objects misplaced > (5.506%) > 2023-02-08 21:30:00 pgs: 62851681/691618387 objects misplaced > (9.088%) <<< increase > 2023-02-09 07:30:00 pgs: 57693559/693214621 objects misplaced > (8.323%) > 2023-02-09 13:03:27 ceph balancer off > 2023-02-09 22:30:00 pgs: 48248474/694618590 objects misplaced > (6.946%) > 2023-02-10 14:00:00 pgs: 37872501/696136659 objects misplaced > (5.440%) > 2023-02-10 19:00:00 pgs: 64062339/696645992 objects misplaced > (9.196%) <<< increase > 2023-02-11 04:30:00 pgs: 52183052/699438360 objects misplaced > (7.461%) > 2023-02-11 19:00:00 pgs: 49167191/699631520 objects misplaced > (7.028%) > 2023-02-12 12:00:00 pgs: 39485735/700248360 objects misplaced > (5.639%) > 2023-02-12 18:00:00 pgs: 63540276/700256808 objects misplaced > (9.074%) <<< increase > 2023-02-12 20:30:00 pgs: 61766426/700291865 objects misplaced > (8.820%) > 2023-02-13 04:00:00 pgs: 57622457/700365805 objects misplaced > (8.227%) > > > --- annotated log extract --- > ## > ## "misplaced" decreasing... > ## > Feb 12 17:50:20 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:18.782191+0000 mgr.b4 (mgr.17297882) 4129637 : cluster > [DBG] pgmap v4094135: 5945 pgs: 3 active+remapped+backfilling, 6 > active+remapped+backfill_wait, 5936 active+clean; 88 TiB data, 185 TiB > used, 243 TiB / 428 TiB avail; 1.5 MiB/s rd, 4.1 MiB/s wr, 178 op/s; > 35014018/700256808 objects misplaced (5.000%); 34 MiB/s, 34 objects/s > recovering > Feb 12 17:50:20 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:18.782191+0000 mgr.b4 (mgr.17297882) 4129637 : cluster > [DBG] pgmap v4094135: 5945 pgs: 3 active+remapped+backfilling, 6 > active+remapped+backfill_wait, 5936 active+clean; 88 TiB data, 185 TiB > used, 243 TiB / 428 TiB avail; 1.5 MiB/s rd, 4.1 MiB/s wr, 178 op/s; > 35014018/700256808 objects misplaced (5.000%); 34 MiB/s, 34 objects/s > recovering > Feb 12 17:50:27 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:26.833542+0000 mgr.b4 (mgr.17297882) 4129641 : cluster > [DBG] pgmap v4094141: 5945 pgs: 3 active+remapped+backfilling, 6 > active+remapped+backfill_wait, 5936 active+clean; 88 TiB data, 185 TiB > used, 243 TiB / 428 TiB avail; 2.7 MiB/s rd, 4.9 MiB/s wr, 226 op/s; > 35012325/700256808 objects misplaced (5.000%); 37 MiB/s, 36 objects/s > recovering > Feb 12 17:50:30 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:28.848020+0000 mgr.b4 (mgr.17297882) 4129642 : cluster > [DBG] pgmap v4094143: 5945 pgs: 3 active+remapped+backfilling, 6 > active+remapped+backfill_wait, 5936 active+clean; 88 TiB data, 185 TiB > used, 243 TiB / 428 TiB avail; 2.0 MiB/s rd, 4.9 MiB/s wr, 436 op/s; > 35011830/700256808 objects misplaced (5.000%); 36 MiB/s, 35 objects/s > recovering > ## > ## shortly after, "misplaced" has increased, followed by the "starting > backfill" lines > ## I'm not sure if the line ordering here is a quirk of journalctl, maybe > the "starting backfill" came first which would make some sense > ## > Feb 12 17:50:32 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:30.860404+0000 mgr.b4 (mgr.17297882) 4129643 : cluster > [DBG] pgmap v4094145: 5945 pgs: 3 active+remapped+backfilling, 8 > active+remapped+backfill_wait, 5934 active+clean; 88 TiB data, 185 TiB > used, 243 TiB / 428 TiB avail; 419 KiB/s rd, 1.2 MiB/s wr, 388 op/s; > 44579289/700256808 objects misplaced (6.366%); 31 MiB/s, 30 objects/s > recovering > Feb 12 17:50:32 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:31.031822+0000 osd.64 (osd.64) 362 : cluster [DBG] 33.2cs0 > starting backfill to osd.25(4) from (0'0,0'0] MAX to 639282'24881196 > Feb 12 17:50:32 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:31.031951+0000 osd.64 (osd.64) 363 : cluster [DBG] 33.6cs0 > starting backfill to osd.25(4) from (0'0,0'0] MAX to 639282'24593871 > Feb 12 17:50:32 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:31.044285+0000 osd.64 (osd.64) 364 : cluster [DBG] 33.6cs0 > starting backfill to osd.51(5) from (0'0,0'0] MAX to 639282'24593871 > Feb 12 17:50:32 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:31.044655+0000 osd.64 (osd.64) 365 : cluster [DBG] 33.2cs0 > starting backfill to osd.51(5) from (0'0,0'0] MAX to 639282'24881196 > Feb 12 17:50:32 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:31.057888+0000 osd.64 (osd.64) 366 : cluster [DBG] 33.6cs0 > starting backfill to osd.63(7) from (0'0,0'0] MAX to 639282'24593871 > Feb 12 17:50:32 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:31.058039+0000 osd.64 (osd.64) 367 : cluster [DBG] 33.2cs0 > starting backfill to osd.63(7) from (0'0,0'0] MAX to 639282'24881196 > Feb 12 17:50:32 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:31.077517+0000 osd.64 (osd.64) 368 : cluster [DBG] 33.6cs0 > starting backfill to osd.68(2) from (0'0,0'0] MAX to 639282'24593871 > ## (more "starting backfill" lines elided here) > ## > ## "misplaced" continues to increase rapidly to 9%... > ## > Feb 12 17:50:34 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:32.871242+0000 mgr.b4 (mgr.17297882) 4129644 : cluster > [DBG] pgmap v4094148: 5945 pgs: 2 activating+remapped, 3 > active+remapped+backfilling, 8 active+remapped+backfill_wait, 5932 > active+clean; 88 TiB data, 185 TiB used, 243 TiB / 428 TiB avail; 424 KiB/s > rd, 1.2 MiB/s wr, 440 op/s; 54133989/700256808 objects misplaced (7.731%); > 31 MiB/s, 31 objects/s recovering > Feb 12 17:50:36 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:34.886503+0000 mgr.b4 (mgr.17297882) 4129645 : cluster > [DBG] pgmap v4094149: 5945 pgs: 2 activating+remapped, 3 > active+remapped+backfilling, 8 active+remapped+backfill_wait, 5932 > active+clean; 88 TiB data, 185 TiB used, 243 TiB / 428 TiB avail; 442 KiB/s > rd, 1.1 MiB/s wr, 483 op/s; 54133670/700256808 objects misplaced (7.731%); > 41 MiB/s, 41 objects/s recovering > Feb 12 17:50:37 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:36.901531+0000 mgr.b4 (mgr.17297882) 4129646 : cluster > [DBG] pgmap v4094150: 5945 pgs: 4 activating+remapped, 3 > active+remapped+backfilling, 8 active+remapped+backfill_wait, 5930 > active+clean; 88 TiB data, 185 TiB used, 243 TiB / 428 TiB avail; 281 KiB/s > rd, 902 KiB/s wr, 324 op/s; 63666090/700256808 objects misplaced (9.092%); > 17 MiB/s, 17 objects/s recovering > ## > ## then starts to fall again... > ## > Feb 12 17:50:42 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:40.930869+0000 mgr.b4 (mgr.17297882) 4129648 : cluster > [DBG] pgmap v4094152: 5945 pgs: 4 activating+remapped, 3 > active+remapped+backfilling, 8 active+remapped+backfill_wait, 5930 > active+clean; 88 TiB data, 185 TiB used, 243 TiB / 428 TiB avail; 407 KiB/s > rd, 1.4 MiB/s wr, 479 op/s; 63665652/700256808 objects misplaced (9.092%); > 28 MiB/s, 28 objects/s recovering > Feb 12 17:50:43 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:42.948421+0000 mgr.b4 (mgr.17297882) 4129649 : cluster > [DBG] pgmap v4094153: 5945 pgs: 3 active+remapped+backfilling, 12 > active+remapped+backfill_wait, 5930 active+clean; 88 TiB data, 185 TiB > used, 243 TiB / 428 TiB avail; 455 KiB/s rd, 1.5 MiB/s wr, 542 op/s; > 63665652/700256808 objects misplaced (9.092%); 27 MiB/s, 26 objects/s > recovering > Feb 12 17:50:46 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:44.961474+0000 mgr.b4 (mgr.17297882) 4129650 : cluster > [DBG] pgmap v4094154: 5945 pgs: 3 active+remapped+backfilling, 12 > active+remapped+backfill_wait, 5930 active+clean; 88 TiB data, 185 TiB > used, 243 TiB / 428 TiB avail; 436 KiB/s rd, 1.4 MiB/s wr, 518 op/s; > 63664579/700256808 objects misplaced (9.092%); 37 MiB/s, 37 objects/s > recovering > Feb 12 17:50:50 b2 > ceph-c6618970-0ce0-4cb2-bc9a-dd5f29b62e24-mon-b2[1613051]: cluster > 2023-02-12T06:50:48.991308+0000 mgr.b4 (mgr.17297882) 4129652 : cluster > [DBG] pgmap v4094156: 5945 pgs: 3 active+remapped+backfilling, 12 > active+remapped+backfill_wait, 5930 active+clean; 88 TiB data, 185 TiB > used, 243 TiB / 428 TiB avail; 513 KiB/s rd, 1.6 MiB/s wr, 534 op/s; > 63664502/700256808 objects misplaced (9.092%); 33 MiB/s, 33 objects/s > recovering > > ------------------------------------------------------------------------------ > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx