RBD backend might be even worse, depending on how large dataset you try. One 4KB block can end up creating a 4MB object, and depending on how well hole-punching and fallocate works on your system you could in theory end up with a >1000 amplification if you always hit a different 4MB chunk (but that's not realistic). Is that right? Jan > On 26 Feb 2016, at 22:05, Josh Durgin <jdurgin@xxxxxxxxxx> wrote: > > On 02/24/2016 07:10 PM, Christian Balzer wrote: >> 10 second rados bench with 4KB blocks, 219MB written in total. >> nand-writes per SSD:41*32MB=1312MB. >> 10496MB total written to all SSDs. >> Amplification:48!!! >> >> Le ouch. >> In my use case with rbd cache on all VMs I expect writes to be rather >> large for the most part and not like this extreme example. >> But as I wrote the last time I did this kind of testing, this is an area >> where caveat emptor most definitely applies when planning and buying SSDs. >> And where the Ceph code could probably do with some attention. > > In this case it's likely rados bench using tiny objects that's > causing the massive overhead. rados bench is doing each write to a new > object, which ends up in a new file beneath the osd, with its own > xattrs too. For 4k writes, that's a ton of overhead. > > fio with the rbd backend will give you a more realistic picture. > In jewel there will be --max-objects and --object-size options for > rados bench to get closer to an rbd-like workload as well. > > Josh > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com