Hi Eugen: déjà vu again? I think the way autoscaler code in the MGRs interferes with operations is extremely confusing. Could this be the same issue I and somebody else had a while ago? Even though autoscaler is disabled, there are parts of it in the MGR still interfering. One of the essential config options was target_max_misplaced_ratio, which needs to be set to 1 if you want to have all PGs created regardless of how many objects are misplaced. The thread was https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/WST6K5A4UQGGISBFGJEZS4HFL2VVWW32 In addition, the PG splitting will stop if recovery IO is going on (some objects are degraded). Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Martin Buss <mbuss7004@xxxxxxxxx> Sent: 14 December 2022 19:32 To: ceph-users@xxxxxxx Subject: Re: New pool created with 2048 pg_num not executed will do, that will take another day or so. Can this have to do anything with osd_pg_bits that defaults to 6 some operators seem to be working with 8 or 11 Can you explain what this option means? I could not quite understand from the documentation. Thanks! On 14.12.22 16:11, Eugen Block wrote: > Then I'd suggest to wait until the backfilling is done and then report > back if the PGs are still not created. I don't have information about > the ML admin, sorry. > > Zitat von Martin Buss <mbuss7004@xxxxxxxxx>: > >> that cephfs_data has been autoscaling while filling, the mismatched >> numbers are a result of that autoscaling >> >> the cluster status is WARN as there is still some old stuff >> backfilling on cephfs_data >> >> The issue is the newly created pool 9 cfs_data, which is stuck at 1152 >> pg_num >> >> ps: can you help me to get in touch with the list admin so I can get >> that post including private info deleted >> >> On 14.12.22 15:41, Eugen Block wrote: >>> I'm wondering why the cephfs_data pool has mismatching pg_num and >>> pgp_num: >>> >>>> pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 >>>> object_hash rjenkins pg_num 187 pgp_num 59 autoscale_mode off >>> >>> Does disabling the autoscaler leave it like that when you disable it >>> in the middle of scaling? What is the current 'ceph status'? >>> >>> >>> Zitat von Martin Buss <mbuss7004@xxxxxxxxx>: >>> >>>> Hi Eugen, >>>> >>>> thanks, sure, below: >>>> >>>> pg_num stuck at 1152 and pgp_num stuck at 1024 >>>> >>>> Regards, >>>> >>>> Martin >>>> >>>> ceph config set global mon_max_pg_per_osd 400 >>>> >>>> ceph osd pool create cfs_data 2048 2048 --pg_num_min 2048 >>>> pool 'cfs_data' created >>>> >>>> pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 >>>> object_hash rjenkins pg_num 187 pgp_num 59 autoscale_mode off >>>> last_change 3099 lfor 0/3089/3096 flags hashpspool,bulk stripe_width >>>> 0 target_size_ratio 1 application cephfs >>>> pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 >>>> object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode off >>>> last_change 2942 lfor 0/0/123 flags hashpspool stripe_width 0 >>>> pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application >>>> cephfs >>>> pool 3 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash >>>> rjenkins pg_num 1 pgp_num 1 autoscale_mode off last_change 2943 >>>> flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 >>>> application mgr >>>> pool 9 'cfs_data' replicated size 3 min_size 2 crush_rule 0 >>>> object_hash rjenkins pg_num 1152 pgp_num 1024 pg_num_target 2048 >>>> pgp_num_target 2048 autoscale_mode off last_change 3198 lfor >>>> 0/0/3198 flags hashpspool stripe_width 0 pg_num_min 2048 >>>> >>>> >>>> >>>> On 14.12.22 15:10, Eugen Block wrote: >>>>> Hi, >>>>> >>>>> are there already existing pools in the cluster? Can you share your >>>>> 'ceph osd df tree' as well as 'ceph osd pool ls detail'? It sounds >>>>> like ceph is trying to stay within the limit of mon_max_pg_per_osd >>>>> (default 250). >>>>> >>>>> Regards, >>>>> Eugen >>>>> >>>>> Zitat von Martin Buss <mbuss7004@xxxxxxxxx>: >>>>> >>>>>> Hi, >>>>>> >>>>>> on quincy, I created a new pool >>>>>> >>>>>> ceph osd pool create cfs_data 2048 2048 >>>>>> >>>>>> 6 hosts 71 osds >>>>>> >>>>>> autoscaler is off; I find it kind of strange that the pool is >>>>>> created with pg_num 1152 and pgp_num 1024, mentioning the 2048 as >>>>>> the new target. I cannot manage to actually make this pool contain >>>>>> 2048 pg_num and 2048 pgp_num. >>>>>> >>>>>> What config option am I missing that does not allow me to grow the >>>>>> pool to 2048? Although I specified pg_num and pgp_num be the same, >>>>>> it is not. >>>>>> >>>>>> Please some help and guidance. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Martin >>>>>> _______________________________________________ >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsu _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx