Re: pool pgp_num not updated

Eugen Block <eblock@xxxxxx> · Wed, 07 Oct 2020 18:50:45 +0000

Yes, I think that’s exactly the reason. As soon as the cluster has  
more space the backfill will continue.

Zitat von Mac Wynkoop <mwynkoop@xxxxxxxxxxxx>:

The cluster is currently in a warn state, here's the scrubbed output of
ceph -s:

*cluster:    id:     *redacted*    health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set            22 nearfull osd(s)            2
pool(s) nearfull            Low space hindering backfill (add storage if
this doesn't resolve itself): 277 pgs backfill_toofull            Degraded
data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 pgs
degraded, 341 pgs undersized            1214 pgs not deep-scrubbed in time
          2647 pgs not scrubbed in time            2 daemons have recently
crashed   services:    mon:         5 daemons, *redacted* (age 44h)    mgr:
        *redacted*    osd:         162 osds: 162 up (since 44h), 162 in
(since 4d); 971 remapped pgs                 flags noscrub,nodeep-scrub
rgw:         3 daemons active *redacted*    tcmu-runner: 18 daemons active
*redacted*   data:    pools:   10 pools, 2648 pgs    objects: 409.56M
objects, 738 TiB    usage:   1.3 PiB used, 580 TiB / 1.8 PiB avail    pgs:
    32652738/3651947772 objects degraded (0.894%)
 517370913/3651947772 objects misplaced (14.167%)             1677
active+clean             477  active+remapped+backfill_wait             100
 active+remapped+backfill_wait+backfill_toofull             80
active+undersized+degraded+remapped+backfill_wait             60
active+undersized+degraded+remapped+backfill_wait+backfill_toofull
   42   active+undersized+degraded+remapped+backfill_toofull             33
  active+undersized+degraded+remapped+backfilling             25
active+remapped+backfilling             25
active+remapped+backfill_toofull             24
active+undersized+remapped+backfilling             23
active+forced_recovery+undersized+degraded+remapped+backfill_wait
   19
active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull
           15   active+undersized+remapped+backfill_wait             14
active+undersized+remapped+backfill_wait+backfill_toofull             12
active+forced_recovery+undersized+degraded+remapped+backfill_toofull
     12   active+forced_recovery+undersized+degraded+remapped+backfilling
           5    active+undersized+remapped+backfill_toofull             3
 active+remapped             1    active+undersized+remapped             1
   active+forced_recovery+undersized+remapped+backfilling   io:    client:
  287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr    recovery: 425
MiB/s, 225 objects/s*
Now as you can see, we do have a lot of backfill operations going on at the
moment. Does that actually prevent Ceph from modifying the pgp_num value of
a pool?

Thanks,
Mac Wynkoop

On Wed, Oct 7, 2020 at 8:57 AM Eugen Block <eblock@xxxxxx> wrote:

What is the current cluster status, is it healthy? Maybe increasing
pg_num would hit the limit of mon_max_pg_per_osd? Can you share 'ceph
-s' output?

Zitat von Mac Wynkoop <mwynkoop@xxxxxxxxxxxx>:

> Right, both Norman and I set the pg_num before the pgp_num. For example,
> here is my current pool settings:
>
>
> *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7
> crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target
> 2048 last_change 8458830 lfor 0/0/8445757 flags
> hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576
fast_read
> 1 application rgw"*
> So, when I set:
>
>  "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*"
>
> it returns:
>
> "*set pool 40 pgp_num to 2048*"
>
> But upon checking the pool details again:
>
> "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7
> crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target
> 2048 last_change 8458870 lfor 0/0/8445757 flags
> hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576
fast_read
> 1 application rgw*"
>
> and the pgp_num value does not increase. Am I just doing something
> totally wrong?
>
> Thanks,
> Mac Wynkoop
>
>
>
>
> On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
wrote:
>
>> pg_num and pgp_num need to be the same, not?
>>
>> 3.5.1. Set the Number of PGs
>>
>> To set the number of placement groups in a pool, you must specify the
>> number of placement groups at the time you create the pool. See Create a
>> Pool for details. Once you set placement groups for a pool, you can
>> increase the number of placement groups (but you cannot decrease the
>> number of placement groups). To increase the number of placement groups,
>> execute the following:
>>
>> ceph osd pool set {pool-name} pg_num {pg_num}
>>
>> Once you increase the number of placement groups, you must also increase
>> the number of placement groups for placement (pgp_num) before your
>> cluster will rebalance. The pgp_num should be equal to the pg_num. To
>> increase the number of placement groups for placement, execute the
>> following:
>>
>> ceph osd pool set {pool-name} pgp_num {pgp_num}
>>
>>
>>
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/storage_strategies_guide/placement_groups_pgs
>>
>> -----Original Message-----
>> To: norman
>> Cc: ceph-users
>> Subject:  Re: pool pgp_num not updated
>>
>> Hi everyone,
>>
>> I'm seeing a similar issue here. Any ideas on this?
>> Mac Wynkoop,
>>
>>
>>
>> On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern@xxxxxxx> wrote:
>>
>> > Hi guys,
>> >
>> > When I update the pg_num of a pool, I found it not worked(no
>> > rebalanced), anyone know the reason? Pool's info:
>> >
>> > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule
>> > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024
>> > autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags
>> > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd
>> >          removed_snaps
>> > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7]
>> > pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22
>> > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256
>> > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor
>> > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0
>> > application rbd
>> >
>> > The pgp_num_target is set, but pgp_num not set.
>> >
>> > I have scale out new OSDs and is backfilling before setting the value,
>>
>> > is it the reason?
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
>> > email to ceph-users-leave@xxxxxxx
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
>> email to ceph-users-leave@xxxxxxx
>>
>>
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx