On Tue, 14 Jun 2016, Igor Fedotov wrote: > This result are for compression = none and write block size limited to > 4K. I've been thinking more about this and I'm wondering if we should revisit the choice to use a min_alloc_size of 4K on flash. If it's 4K, then a 4K write means - 4K write (to newly allocated block) - bdev flush - kv commit (4k-ish?) - bdev flush which puts a 2 write lower bound on latency. If we have min_alloc_size of 8K or 16K, then a 4K write is - kv commit (4K + 4k-ish) - bdev flush - [async] 4k write Fewer bdev flushes, and only marginally more writes to the device. I guess the question is is whether write-amp is really that important for a 4k workload? The upside of a larger min_alloc_size is the worst case metadata (onode) size is 1/2 or 1/4. The sequential read cost of a previously random-written object will also be better (fewer IOs). There is probably a case where 4k min_alloc_size is the right choice but it feels like we're optimizing for write-amp to the detriment of other more important things. For example, even after we improve the onode encoding, it may be that the larger metadata results in more write-amp than the WAL for the 4k writes does. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html