Re: The confusing output of ceph df command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Igor,

I think I misunderstood the output of USED. The info should be allocated size, not equal 1.5*STORED sometimes.

For example: when writing 4k file, It may allocate 64k that seems to use more spaces, but If you write another 4k, it can use the same blob.(I will validate the guess).

So ceph df may not reflect the new files we can store.

I'm reading codes of ceph, I will reply the thread when I got the correct meaning.

Thanks,

Norman

On 11/9/2020 上午7:40, Igor Fedotov wrote:
Norman,

>default-fs-data0                    9     374 TiB 1.48G 939 TiB     74.71       212 TiB

given the above numbers 'default-fs-data0' pool has average object size around 256K (374 TiB / 1.48G objects). Are you sure that absolute majority of your objects in this pool are 4M?


Wondering what are the df report for the 'good' cluster?


Additionally (given that default-fs-data0 keeps most of data for the cluster) you might want to estimate allocation losses via performance counters inspection: bluestore_stored vs. bluestore_allocated.

Summing the delta between them over all (hdd?) OSDs you might get the total loss. Simpler way to do the same is to learn the deltas from a 2-3 of OSDs and if just multiply the average delta by amount of OSDs. This less precise but statistically should be good enough...


Thanks,

Igor


On 9/10/2020 5:10 AM, norman wrote:
Igor,

Thanks for your reply.  The object size is 4M and almost no overwrites in the pool, why space loss happened in the pool?

I have another cluster with the same config, Its USED is almost equal to 1.5*STORED, the diff between them is:

The cluster has different OSD size(12T and 8T) .

Norman
On 9/9/2020 下午7:17, Igor Fedotov wrote:
Hi Norman,

not pretending to know the exact root cause but IMO one of the working hypothesis might be as follows :

Presuming spinners as backing devices for you OSDs and hence 64K allocation unit (bluestore min_alloc_size_hdd param).

1) 1.48GB user objects result in 1.48G * 6 = 8.88G EC shards.

2) Shards tend to be unaligned with 64K allocation unit which might result in an average loss of 32K per each shard.

3) Hence total loss due to allocation overhead to be estimated at 32K * 8.88G = 284T  which looks close enough to your numbers for default-fs-data0:

939TiB - 374 TiB / 4 * 6 = 378 TiB of space loss.


Additional issue which might result in the space loss is space amplification occurred caused by partial unaligned overwrites to objects in EC pool. See my post "Root cause analysis for space overhead with erasure coded pools." to dev@xxxxxxx mailing list on Jan 23.


Migrating to 4K min alloc size seems to be the only known way to fix (or rather workaround) these issues. Upcoming Pacific release is gonna to bring downsizing to 4K (for new OSD deployments) along with some additional changes to smooth corresponding negative performance impacts.


Hope this helps.

Igor



On 9/9/2020 2:30 AM, norman kern wrote:
Hi,

I have changed most of pools from 3-replica to ec 4+2 in my cluster, when I use ceph df command to show

the used capactiy of the cluster:

RAW STORAGE:
    CLASS         SIZE        AVAIL       USED        RAW USED     %RAW USED     hdd           1.8 PiB     788 TiB     1.0 PiB      1.0 PiB         57.22     ssd           7.9 TiB     4.6 TiB     181 GiB      3.2 TiB         41.15     ssd-cache     5.2 TiB     5.2 TiB      67 GiB       73 GiB          1.36     TOTAL         1.8 PiB     798 TiB     1.0 PiB      1.0 PiB         56.99

POOLS:
    POOL                                ID     STORED OBJECTS     USED        %USED     MAX AVAIL     default-oss.rgw.control             1         0 B 8         0 B         0       1.3 TiB     default-oss.rgw.meta                2      22 KiB 97 3.9 MiB         0       1.3 TiB     default-oss.rgw.log                 3     525 KiB 223 621 KiB         0       1.3 TiB     default-oss.rgw.buckets.index       4      33 MiB 34 33 MiB         0       1.3 TiB     default-oss.rgw.buckets.non-ec      5     1.6 MiB 48 3.8 MiB         0       1.3 TiB     .rgw.root                            6     3.8 KiB 16 720 KiB         0       1.3 TiB     default-oss.rgw.buckets.data        7     274 GiB 185.39k     450 GiB      0.14       212 TiB     default-fs-metadata                 8     488 GiB 153.10M     490 GiB     10.65       1.3 TiB     default-fs-data0                    9     374 TiB 1.48G 939 TiB     74.71       212 TiB

   ...

The USED = 3 * STORED in 3-replica mode is completely right, but for EC 4+2 pool (for default-fs-data0 )

the USED is not equal 1.5 * STORED, why...:(


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux