Hi,
I've configured an erasure coded pool (3+2) in our Ceph lab environment (ceph version 14.2.4), and I'm trying to verify the behaviour of bluestore_min_alloc_size.
Our OSDs are HDDs, so by default the min_alloc_size is set to 64KB.
ceph daemon osd.X config show | grep bluestore_min_alloc_size_hdd
"bluestore_min_alloc_size_hdd": "65536",
According to the documentation, the unwritten area in each chunk is filled with zeroes when it is written to the raw partition, which can lead to space amplification when writing small objects.
In other words, a 4KB object stored in my cluster should theoretically use 64KB * 5(k+m) = 320KB. Or, quite simply, 64KB per chunk.
To test this, I uploaded a 4KB object, and used the ceph-objectstore-tool to output the size of the object on one of the OSDs:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-20 --pgid 15.93s2 b458a7bf-0643-4c04-bccc-f7f8feb0bd20.4889853.3_ceph.txt dump | jq '.stat'
{
"size": 4096,
"blksize": 4096,
"blocks": 1,
"nlink": 1
}
I was expecting size to be 64KB, but perhaps it doesn't take into account the area filled with zeroes? Note, in this case, size = 4K because that is the stripe unit size specified in my erasure coding profile.
Is there any other way of querying the object to verify that each chunk is using 64K, or that the object size in total is using 320KB?
Obviously, if I only have one object in the pool, then I can use "rados df", but as soon as I add more objects of different sizes, I lose this ability.
rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
ec32 320 KiB 1 0 5 0 0 1 0 0 B 1 4 KiB 0 B 0 B
Thanks and regards,
James.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com