'ceph df' %USED explanation

Mark Johnson <markj@xxxxxxxxx> · Mon, 1 Mar 2021 01:16:07 +0000

I'm in the middle of increasing PG count for one of our pools by making small increments, waiting for the process to complete, rinse and repeat.  I'm doing it this way so I can control when all this activity is happening and keeping it away from the busier production traffic times.

I'm expecting some inbalance as PGs get created on already unbalanced OSDs, however our monitoring picked up something today that I'm not really understanding.  Our total utilization is just over 50% and about 96% of our total data is in this one pool.  Due to there not being enough PGs, the amount of data in each is quite large and since they aren't evenly spread across the OSDs, there's a bit of inbalance.  That's all cool and to be expected, which is the reason for increasing the PG count in the first place.

However, as some PGs are splitting, the new PGs are sometimes being created on OSDs that already have a disproportionate amount of data.  Again, not totally unexpected.  Our monitoring detected the usage of this pool to be >85% today as I neared the end of another increase in PG count.  What I'm not understanding is how this value is determined.  I've read other posts and the calculations suggested don't give a result that equals what shows in my %USED column.  I'm suspecting that it's somehow related to the MAX AVAIL value (which I believe is somewhat indirectly related to the amount available based on the individual OSD utilization), but none of the posts I read mention this in their calculations and I've been unable to create a formula with any of the values I have to end up with the &USED value I have.

For the record, my current total utilization based on a 'ceph osd df' looks like this:

              TOTAL 39507G(SIZE) 19931G(USE) 17568G(AVAIL) 50.45(%USE)

My most utilised OSD (currently in the process of moving some data off this OSD) is 81.58% used with 188G available and a variance of 1.62.

A cut-down output of 'ceph df' looks like this:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    39507G     17569G       19930G         50.45
POOLS:
    NAME                          ID     USED       %USED     MAX AVAIL     OBJECTS
    default.rgw.buckets.data      30      9552G     86.05         1548G     36285066

I suspect that as I get the utilization of my over-utilized OSDs down, this %USED value will drop.  But, I'd just love to fully understand how this value is calculated.

Thanks,
Mark J

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx