Re: Questions for NVRAM+SATA SSDs with Bluestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please see below link
(http://www.slideshare.net/Inktank_Ceph/af-ceph-ceph-performance-analysis-and-improvement-on-flash/7)

The number of OSD node is 4. We used PMC 8GB NVRAM product as journaling disk.
Each OSD node used one NVRAM and the number of OSD daemons per node
was 4 so each OSD daemon used 2GB for the journal usage.
We used FIO tool (3 minutes running with 12 jobs, 12 queue depth) on
each client for performance evaluation and
set replication factor as 2.

Thanks.

2016-07-01 1:15 GMT+09:00 Somnath Roy <Somnath.Roy@xxxxxxxxxxx>:
> Could you please explain in detail about your test configuration  (like how many osds/ replication/ NVRAM used on where ) ? Also, how long you ran the test ?
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of myoungwon oh
> Sent: Thursday, June 30, 2016 1:40 AM
> To: Sage Weil
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: Questions for NVRAM+SATA SSDs with Bluestore
>
> Hi.
>
>
> As you mentioned, bluesotre_min_alloc_size can send data to the wal path.
>
> Performance is improved than wrtting directly SSDs. (more than 10KIOPS)
>
> However, performance of bluestore is lower than filestore (see below).
>
> I thinks that there are many performance options for bluestore.
>
> Therefore, i need to understand it in order to see real performance.
>
> (If you have recommended options, please let me know)
>
>
>
> Anyway, other observations are that
>
> 1. More than 70KIOPS is observed at first 20~30 second during performance test. after 20~30 second, performance is drop significantly.
>
>     (one expected reason is that meta data size (blob map, extent map) is increasing)
>
> 2. high latency (note that this is nvram)
>
>
>
> Thanks.
>
>
>
> Bluestore (master branch, 6/29, No configurations are changed)
>
> ################################### BW result IO_Size  randwrite
>
> 4KB      234.42
>
> ################################### IOPS result IO_Size  randwrite
>
> 4KB      60006
>
> ################################### Latency result IO_Size  randwrite (ms)
>
> 4KB      9.595
>
> ################################### CPU utilization IO_Size  randwrite
>
> 4KB      52.53
>
>
>
> Filestore (jewel, 10.2.1, No configurations are changed)
>
>
> ################################### BW result IO_Size  randwrite
>
> 4KB      260.33
>
> ################################### IOPS result IO_Size  randwrite
>
> 4KB      66640
>
> ################################### Latency result IO_Size  randwrite (ms)
>
> 4KB      8.642
>
> ################################### CPU utilization IO_Size  randwrite
>
> 4KB      56.42
>
>
> 2016-06-27 21:31 GMT+09:00 Sage Weil <sage@xxxxxxxxxxxx>:
>> On Mon, 27 Jun 2016, myoungwon oh wrote:
>>> Hi, I have questions for bluestore (4K random write case).
>>>
>>> So far, we have used NVRAM(PCIe) as journal and SSD (SATA) as data
>>> disk (filestore).
>>> Therefore, we got performance gain from NVRAM journal.
>>> However, current Bluestore design seems that data (4K aligned) is
>>> written to data disk first, then metadata is written to WAL rocksdb.
>>> This design can remove “double write” in objectstore, but in our
>>> case, NVRAM can not be utilized fully.
>>>
>>>  So, my questions are that
>>>
>>> 1. Can bluestore write WAL first as filestore?
>>
>> You can do it indirectly with bluestore_min_alloc_size=65536, which
>> will send anything smaller than this value through the wal path.
>> Please let us know what effect this has on our latency/performance!
>>
>>> 2. If not, using bcache or flashcache for NVRAM on top of SSDs is
>>> right answer?
>>
>> This is also possible, but I expect we'd like to make this work out of
>> the box if we can!
>>
>> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux