OK, great. We'll keep tabs on it for now then and try again once we're fully rebalanced. Mac Wynkoop, Senior Datacenter Engineer *NetDepot.com:* Cloud Servers; Delivered Houston | Atlanta | NYC | Colorado Springs 1-844-25-CLOUD Ext 806 On Thu, Oct 8, 2020 at 2:08 AM Eugen Block <eblock@xxxxxx> wrote: > Yes, after your cluster has recovered you'll be able to increase > pgp_num. Or your change will be applied automatically since you > already set it, I'm not sure but you'll see. > > > Zitat von Mac Wynkoop <mwynkoop@xxxxxxxxxxxx>: > > > Well, backfilling sure, but will it allow me to actually change the > pgp_num > > as more space frees up? Because the issue is that I cannot modify that > > value. > > > > Thanks, > > Mac Wynkoop, Senior Datacenter Engineer > > *NetDepot.com:* Cloud Servers; Delivered > > Houston | Atlanta | NYC | Colorado Springs > > > > 1-844-25-CLOUD Ext 806 > > > > > > > > > > On Wed, Oct 7, 2020 at 1:50 PM Eugen Block <eblock@xxxxxx> wrote: > > > >> Yes, I think that’s exactly the reason. As soon as the cluster has > >> more space the backfill will continue. > >> > >> > >> Zitat von Mac Wynkoop <mwynkoop@xxxxxxxxxxxx>: > >> > >> > The cluster is currently in a warn state, here's the scrubbed output > of > >> > ceph -s: > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > *cluster: id: *redacted* health: HEALTH_WARN > >> > noscrub,nodeep-scrub flag(s) set 22 nearfull osd(s) > >> 2 > >> > pool(s) nearfull Low space hindering backfill (add storage > if > >> > this doesn't resolve itself): 277 pgs backfill_toofull > >> Degraded > >> > data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 > pgs > >> > degraded, 341 pgs undersized 1214 pgs not deep-scrubbed in > >> time > >> > 2647 pgs not scrubbed in time 2 daemons have > >> recently > >> > crashed services: mon: 5 daemons, *redacted* (age 44h) > >> mgr: > >> > *redacted* osd: 162 osds: 162 up (since 44h), 162 > in > >> > (since 4d); 971 remapped pgs flags > noscrub,nodeep-scrub > >> > rgw: 3 daemons active *redacted* tcmu-runner: 18 daemons > >> active > >> > *redacted* data: pools: 10 pools, 2648 pgs objects: 409.56M > >> > objects, 738 TiB usage: 1.3 PiB used, 580 TiB / 1.8 PiB avail > >> pgs: > >> > 32652738/3651947772 objects degraded (0.894%) > >> > 517370913/3651947772 objects misplaced (14.167%) 1677 > >> > active+clean 477 active+remapped+backfill_wait > >> 100 > >> > active+remapped+backfill_wait+backfill_toofull 80 > >> > active+undersized+degraded+remapped+backfill_wait 60 > >> > active+undersized+degraded+remapped+backfill_wait+backfill_toofull > >> > 42 active+undersized+degraded+remapped+backfill_toofull > >> 33 > >> > active+undersized+degraded+remapped+backfilling 25 > >> > active+remapped+backfilling 25 > >> > active+remapped+backfill_toofull 24 > >> > active+undersized+remapped+backfilling 23 > >> > active+forced_recovery+undersized+degraded+remapped+backfill_wait > >> > 19 > >> > > >> > active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull > >> > 15 active+undersized+remapped+backfill_wait > 14 > >> > active+undersized+remapped+backfill_wait+backfill_toofull > 12 > >> > active+forced_recovery+undersized+degraded+remapped+backfill_toofull > >> > 12 > active+forced_recovery+undersized+degraded+remapped+backfilling > >> > 5 active+undersized+remapped+backfill_toofull > 3 > >> > active+remapped 1 active+undersized+remapped > >> 1 > >> > active+forced_recovery+undersized+remapped+backfilling io: > >> client: > >> > 287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr recovery: > 425 > >> > MiB/s, 225 objects/s* > >> > Now as you can see, we do have a lot of backfill operations going on > at > >> the > >> > moment. Does that actually prevent Ceph from modifying the pgp_num > value > >> of > >> > a pool? > >> > > >> > Thanks, > >> > Mac Wynkoop > >> > > >> > > >> > > >> > On Wed, Oct 7, 2020 at 8:57 AM Eugen Block <eblock@xxxxxx> wrote: > >> > > >> >> What is the current cluster status, is it healthy? Maybe increasing > >> >> pg_num would hit the limit of mon_max_pg_per_osd? Can you share 'ceph > >> >> -s' output? > >> >> > >> >> > >> >> Zitat von Mac Wynkoop <mwynkoop@xxxxxxxxxxxx>: > >> >> > >> >> > Right, both Norman and I set the pg_num before the pgp_num. For > >> example, > >> >> > here is my current pool settings: > >> >> > > >> >> > > >> >> > *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > >> >> > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 > >> pgp_num_target > >> >> > 2048 last_change 8458830 lfor 0/0/8445757 flags > >> >> > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > >> >> fast_read > >> >> > 1 application rgw"* > >> >> > So, when I set: > >> >> > > >> >> > "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" > >> >> > > >> >> > it returns: > >> >> > > >> >> > "*set pool 40 pgp_num to 2048*" > >> >> > > >> >> > But upon checking the pool details again: > >> >> > > >> >> > "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > >> >> > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 > >> pgp_num_target > >> >> > 2048 last_change 8458870 lfor 0/0/8445757 flags > >> >> > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > >> >> fast_read > >> >> > 1 application rgw*" > >> >> > > >> >> > and the pgp_num value does not increase. Am I just doing something > >> >> > totally wrong? > >> >> > > >> >> > Thanks, > >> >> > Mac Wynkoop > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx > > > >> >> wrote: > >> >> > > >> >> >> pg_num and pgp_num need to be the same, not? > >> >> >> > >> >> >> 3.5.1. Set the Number of PGs > >> >> >> > >> >> >> To set the number of placement groups in a pool, you must specify > the > >> >> >> number of placement groups at the time you create the pool. See > >> Create a > >> >> >> Pool for details. Once you set placement groups for a pool, you > can > >> >> >> increase the number of placement groups (but you cannot decrease > the > >> >> >> number of placement groups). To increase the number of placement > >> groups, > >> >> >> execute the following: > >> >> >> > >> >> >> ceph osd pool set {pool-name} pg_num {pg_num} > >> >> >> > >> >> >> Once you increase the number of placement groups, you must also > >> increase > >> >> >> the number of placement groups for placement (pgp_num) before your > >> >> >> cluster will rebalance. The pgp_num should be equal to the > pg_num. To > >> >> >> increase the number of placement groups for placement, execute the > >> >> >> following: > >> >> >> > >> >> >> ceph osd pool set {pool-name} pgp_num {pgp_num} > >> >> >> > >> >> >> > >> >> >> > >> >> > >> > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/storage_strategies_guide/placement_groups_pgs > >> >> >> > >> >> >> -----Original Message----- > >> >> >> To: norman > >> >> >> Cc: ceph-users > >> >> >> Subject: Re: pool pgp_num not updated > >> >> >> > >> >> >> Hi everyone, > >> >> >> > >> >> >> I'm seeing a similar issue here. Any ideas on this? > >> >> >> Mac Wynkoop, > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern@xxxxxxx> > wrote: > >> >> >> > >> >> >> > Hi guys, > >> >> >> > > >> >> >> > When I update the pg_num of a pool, I found it not worked(no > >> >> >> > rebalanced), anyone know the reason? Pool's info: > >> >> >> > > >> >> >> > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 > >> crush_rule > >> >> >> > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target > 1024 > >> >> >> > autoscale_mode warn last_change 85103 lfor 82044/82044/82044 > flags > >> >> >> > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application > >> rbd > >> >> >> > removed_snaps > >> >> >> > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] > >> >> >> > pool 22 'openstack-vms-rs' replicated size 3 min_size 2 > crush_rule > >> 22 > >> >> >> > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 > >> >> >> > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor > >> >> >> > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps > stripe_width > >> > >> >> >> > application rbd > >> >> >> > > >> >> >> > The pgp_num_target is set, but pgp_num not set. > >> >> >> > > >> >> >> > I have scale out new OSDs and is backfilling before setting the > >> value, > >> >> >> > >> >> >> > is it the reason? > >> >> >> > _______________________________________________ > >> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe > send > >> an > >> >> >> > email to ceph-users-leave@xxxxxxx > >> >> >> > > >> >> >> _______________________________________________ > >> >> >> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe > send an > >> >> >> email to ceph-users-leave@xxxxxxx > >> >> >> > >> >> >> > >> >> >> > >> >> > _______________________________________________ > >> >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> >> > >> >> > >> >> _______________________________________________ > >> >> ceph-users mailing list -- ceph-users@xxxxxxx > >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> >> > >> > >> > >> > >> > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx