Hello, We'll soon be building out four new luminous clusters with Bluestore. Our current clusters are running filestore so we're not very familiar with Bluestore yet and I'd like to have an idea of what to expect. Here are the OSD hardware specs (5x per cluster): 2x 3.0GHz 18c/36t 22x 1.8TB 10K SAS (RAID1 OS + 20 OSD's) 5x 480GB Intel S4610 SSD's (WAL and DB) 192 GB RAM 4X Mellanox 25GB NIC PERC H730p With filestore we've found that we can achieve sub-millisecond write latency by running very fast journals (currently Intel S4610's). My main concern is that Bluestore doesn't use journals and instead writes directly to the higher latency HDD; in theory resulting in slower acks and higher write latency. How does Bluestore handle this? Can we expect similar or better performance then our current filestore clusters? I've heard it repeated that Bluestore performs better than Filestore but I've also heard some people claiming this is not always the case with HDD's. Is there any truth to that and if so is there a configuration we can use to achieve this same type of performance with Bluestore?
Bluestore does use journals for small writes and doesn't for big ones. You can try to disable "small writes" by increasing bluestore_prefer_deferred_size, but it's generally pointless because in Bluestore the "journal" is RocksDB's journal (WAL) which creates way too much extra write amplification when big data chunks are put into it. This creates extra load for SSDs and write performance does not increase when compared to the default.
Bluestore is always better in terms of linear write throughput because it has no double-write for big data chunks. But it's roughly on par, and sometimes may even be slightly worse than filestore, in terms of 4K random writes.
-- With best regards, Vitaliy Filippov _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com