I think this is great since when we trying to optimize WAL, we set the write_buffer and memtable very aggressive, which will case read amplification. I was worring about it but now we can have separate column family : Write optimized for big stuff(WAL, and overlay)-----trying to minimized the WA Read optimized for small stuff(onode,omap)--------trying to minimize the RA. And also, we can have different cache policy here, which help us prevent WAL and overlay flush the ondes out of the cache. For WAL, NO CACHE For onode, MAX_CACHE For overlay, medium? No? But since the transaction TPS is the main bottleneck now , maybe we can delay this a bit while? -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil Sent: Wednesday, April 22, 2015 4:56 AM To: ceph-devel@xxxxxxxxxxxxxxx Subject: newstore and rocksdb column families Dhruba (rocksdb dev) asked if column families might be a good fit for controlling the WAL behavior. I'm not certain it addresses specifically the WAL behavior, but it creates a bunch of opportunities for segregating the overlay and/or wal records out from the regular metadata (onodes, omap). The short version is that each column family has it's own memtable and sstable files, but everything shares the same WAL, so you still get the atomicity. I suspect this would be most helpful for the overlay records, where we'll have reasonably large key/value pairs with medium to long lifespans. I'm not sure how helpful it will be with our wal records since if they make it out of the log at all we are already losing. :/ Anyway, something to consider! https://github.com/facebook/rocksdb/wiki/Column-Families sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html