Could you please explain in detail about your test configuration (like how many osds/ replication/ NVRAM used on where ) ? Also, how long you ran the test ? Thanks & Regards Somnath -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of myoungwon oh Sent: Thursday, June 30, 2016 1:40 AM To: Sage Weil Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: Re: Questions for NVRAM+SATA SSDs with Bluestore Hi. As you mentioned, bluesotre_min_alloc_size can send data to the wal path. Performance is improved than wrtting directly SSDs. (more than 10KIOPS) However, performance of bluestore is lower than filestore (see below). I thinks that there are many performance options for bluestore. Therefore, i need to understand it in order to see real performance. (If you have recommended options, please let me know) Anyway, other observations are that 1. More than 70KIOPS is observed at first 20~30 second during performance test. after 20~30 second, performance is drop significantly. (one expected reason is that meta data size (blob map, extent map) is increasing) 2. high latency (note that this is nvram) Thanks. Bluestore (master branch, 6/29, No configurations are changed) ################################### BW result IO_Size randwrite 4KB 234.42 ################################### IOPS result IO_Size randwrite 4KB 60006 ################################### Latency result IO_Size randwrite (ms) 4KB 9.595 ################################### CPU utilization IO_Size randwrite 4KB 52.53 Filestore (jewel, 10.2.1, No configurations are changed) ################################### BW result IO_Size randwrite 4KB 260.33 ################################### IOPS result IO_Size randwrite 4KB 66640 ################################### Latency result IO_Size randwrite (ms) 4KB 8.642 ################################### CPU utilization IO_Size randwrite 4KB 56.42 2016-06-27 21:31 GMT+09:00 Sage Weil <sage@xxxxxxxxxxxx>: > On Mon, 27 Jun 2016, myoungwon oh wrote: >> Hi, I have questions for bluestore (4K random write case). >> >> So far, we have used NVRAM(PCIe) as journal and SSD (SATA) as data >> disk (filestore). >> Therefore, we got performance gain from NVRAM journal. >> However, current Bluestore design seems that data (4K aligned) is >> written to data disk first, then metadata is written to WAL rocksdb. >> This design can remove “double write” in objectstore, but in our >> case, NVRAM can not be utilized fully. >> >> So, my questions are that >> >> 1. Can bluestore write WAL first as filestore? > > You can do it indirectly with bluestore_min_alloc_size=65536, which > will send anything smaller than this value through the wal path. > Please let us know what effect this has on our latency/performance! > >> 2. If not, using bcache or flashcache for NVRAM on top of SSDs is >> right answer? > > This is also possible, but I expect we'd like to make this work out of > the box if we can! > > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f