Re: Unbalanced Cluster

David Schulz <dschulz@xxxxxxxxxxx> · Wed, 4 May 2022 21:02:10 +0000

Hi Josh,

We do have an old pool that is empty so there's 4611 empty PGs but the 
rest seem fairly close:

# ceph pg ls|awk '{print $7/1024/1024/10}'|cut -d "." -f 1|sed -e 
's/$/0/'|sort -n|uniq -c
    4611 00
       1 1170
       8 1180
      10 1190
      28 1200
      51 1210
      54 1220
      52 1230
      32 1240
      13 1250
       7 1260
Hmm, that's interesting, adding up the first column except the 4611 
gives 256 but there are 512 PGs in the main data pool.

Here are our pool settings:

pool 3 'fsmeta' replicated size 3 min_size 1 crush_rule 0 object_hash 
rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 35490 
flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 
recovery_priority 5 application cephfs
pool 4 'fsdata' erasure size 5 min_size 4 crush_rule 1 object_hash 
rjenkins pg_num 4096 pgp_num 4096 autoscale_mode warn last_change 35490 
lfor 0/0/4742 flags hashpspool,ec_overwrites stripe_width 12288 
application cephfs
pool 6 'fsdatak7m2' erasure size 9 min_size 8 crush_rule 3 object_hash 
rjenkins pg_num 512 pgp_num 512 autoscale_mode warn last_change 35490 
flags hashpspool,ec_overwrites stripe_width 28672 application cephfs

The fsdata was originally created with very safe erasure coding that 
wasted too much space, then the fsdatak7m2 was created and everything 
was migrated to it.  This is why there's at least 4096 pgs with 0 bytes.

-Dave

On 2022-05-04 2:08 p.m., Josh Baergen wrote:
> [△EXTERNAL]
>
>
>
> Hi Dave,
>
>> This cluster was upgraded from 13.x to 14.2.9 some time ago.  The entire
>> cluster was installed at the 13.x time and was upgraded together so all
>> OSDs should have the same formatting etc.
> OK, thanks, that should rule out a difference in bluestore
> min_alloc_size, for example.
>
>> Below is pasted the ceph osd df tree output.
> It looks like there is some pretty significant skew in terms of the
> amount of bytes per active PG. If you issue "ceph pg ls", are you able
> to find any PGs with a significantly higher byte count?
>
> Josh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx