Norman,
>default-fs-data0 9 374 TiB 1.48G 939
TiB 74.71 212 TiB
given the above numbers 'default-fs-data0' pool has average object size
around 256K (374 TiB / 1.48G objects). Are you sure that absolute
majority of your objects in this pool are 4M?
Wondering what are the df report for the 'good' cluster?
Additionally (given that default-fs-data0 keeps most of data for the
cluster) you might want to estimate allocation losses via performance
counters inspection: bluestore_stored vs. bluestore_allocated.
Summing the delta between them over all (hdd?) OSDs you might get the
total loss. Simpler way to do the same is to learn the deltas from a 2-3
of OSDs and if just multiply the average delta by amount of OSDs. This
less precise but statistically should be good enough...
Thanks,
Igor
On 9/10/2020 5:10 AM, norman wrote:
Igor,
Thanks for your reply. The object size is 4M and almost no overwrites
in the pool, why space loss happened in the pool?
I have another cluster with the same config, Its USED is almost equal
to 1.5*STORED, the diff between them is:
The cluster has different OSD size(12T and 8T) .
Norman
On 9/9/2020 下午7:17, Igor Fedotov wrote:
Hi Norman,
not pretending to know the exact root cause but IMO one of the
working hypothesis might be as follows :
Presuming spinners as backing devices for you OSDs and hence 64K
allocation unit (bluestore min_alloc_size_hdd param).
1) 1.48GB user objects result in 1.48G * 6 = 8.88G EC shards.
2) Shards tend to be unaligned with 64K allocation unit which might
result in an average loss of 32K per each shard.
3) Hence total loss due to allocation overhead to be estimated at 32K
* 8.88G = 284T which looks close enough to your numbers for
default-fs-data0:
939TiB - 374 TiB / 4 * 6 = 378 TiB of space loss.
Additional issue which might result in the space loss is space
amplification occurred caused by partial unaligned overwrites to
objects in EC pool. See my post "Root cause analysis for space
overhead with erasure coded pools." to dev@xxxxxxx mailing list on
Jan 23.
Migrating to 4K min alloc size seems to be the only known way to fix
(or rather workaround) these issues. Upcoming Pacific release is
gonna to bring downsizing to 4K (for new OSD deployments) along with
some additional changes to smooth corresponding negative performance
impacts.
Hope this helps.
Igor
On 9/9/2020 2:30 AM, norman kern wrote:
Hi,
I have changed most of pools from 3-replica to ec 4+2 in my cluster,
when I use ceph df command to show
the used capactiy of the cluster:
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED
%RAW USED
hdd 1.8 PiB 788 TiB 1.0 PiB 1.0
PiB 57.22
ssd 7.9 TiB 4.6 TiB 181 GiB 3.2
TiB 41.15
ssd-cache 5.2 TiB 5.2 TiB 67 GiB 73
GiB 1.36
TOTAL 1.8 PiB 798 TiB 1.0 PiB 1.0
PiB 56.99
POOLS:
POOL ID STORED OBJECTS
USED %USED MAX AVAIL
default-oss.rgw.control 1 0 B 8 0
B 0 1.3 TiB
default-oss.rgw.meta 2 22 KiB 97 3.9
MiB 0 1.3 TiB
default-oss.rgw.log 3 525 KiB 223 621
KiB 0 1.3 TiB
default-oss.rgw.buckets.index 4 33 MiB 34 33
MiB 0 1.3 TiB
default-oss.rgw.buckets.non-ec 5 1.6 MiB 48 3.8
MiB 0 1.3 TiB
.rgw.root 6 3.8 KiB 16 720
KiB 0 1.3 TiB
default-oss.rgw.buckets.data 7 274 GiB 185.39k
450 GiB 0.14 212 TiB
default-fs-metadata 8 488 GiB 153.10M
490 GiB 10.65 1.3 TiB
default-fs-data0 9 374 TiB 1.48G 939
TiB 74.71 212 TiB
...
The USED = 3 * STORED in 3-replica mode is completely right, but for
EC 4+2 pool (for default-fs-data0 )
the USED is not equal 1.5 * STORED, why...:(
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx