Den mån 15 apr. 2024 kl 13:09 skrev Mitsumasa KONDO <kondo.mitsumasa@xxxxxxxxx>: > Hi Menguy-san, > > Thank you for your reply. Users who use large IO with tiny volumes are a > nuisance to cloud providers. > > I confirmed my ceph cluster with 40 SSDs. Each OSD on 1TB SSD has about 50 > placement groups in my cluster. Therefore, each PG has approximately 20GB > of space. > If we create a small 8GB volume, I had a feeling it wouldn't be distributed > well, but it will be distributed well. RBD images get split into 2 or 4M pieces when stored in ceph, so an 8G RBD image will be split into 2048-or-4096 separate pieces that end up "randomly" on the PGs the pool is based on, which means that if you read or write the whole RBD image from start to end, you are going to spread the load to all OSDs. I think it works something like this, you ask librbd for an 8G image named "myimage", and underneath it makes myimage.0, myimage.1, 2,3,4 and so on. The PG placement will depend on the object name, which of course differs for all the pieces, and hence they end up on different PGs, thereby spreading the load. If ceph did not do this, then you could never make an RBD image that was larger than "smallest free space on any of the pools OSDs" but also, it would mean that the RBD client would be talking to the same single OSD for everything, and that would not be a good way to use a clusters resources evenly. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx