Re: Questions for NVRAM+SATA SSDs with Bluestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.


As you mentioned, bluesotre_min_alloc_size can send data to the wal path.

Performance is improved than wrtting directly SSDs. (more than 10KIOPS)

However, performance of bluestore is lower than filestore (see below).

I thinks that there are many performance options for bluestore.

Therefore, i need to understand it in order to see real performance.

(If you have recommended options, please let me know)



Anyway, other observations are that

1. More than 70KIOPS is observed at first 20~30 second during
performance test. after 20~30 second, performance is drop
significantly.

    (one expected reason is that meta data size (blob map, extent map)
is increasing)

2. high latency (note that this is nvram)



Thanks.



Bluestore (master branch, 6/29, No configurations are changed)

################################### BW result IO_Size  randwrite

4KB      234.42

################################### IOPS result IO_Size  randwrite

4KB      60006

################################### Latency result IO_Size  randwrite (ms)

4KB      9.595

################################### CPU utilization IO_Size  randwrite

4KB      52.53



Filestore (jewel, 10.2.1, No configurations are changed)


################################### BW result IO_Size  randwrite

4KB      260.33

################################### IOPS result IO_Size  randwrite

4KB      66640

################################### Latency result IO_Size  randwrite (ms)

4KB      8.642

################################### CPU utilization IO_Size  randwrite

4KB      56.42


2016-06-27 21:31 GMT+09:00 Sage Weil <sage@xxxxxxxxxxxx>:
> On Mon, 27 Jun 2016, myoungwon oh wrote:
>> Hi, I have questions for bluestore (4K random write case).
>>
>> So far, we have used NVRAM(PCIe) as journal and SSD (SATA) as data
>> disk (filestore).
>> Therefore, we got performance gain from NVRAM journal.
>> However, current Bluestore design seems that data (4K aligned) is
>> written to data disk first, then metadata is written to WAL rocksdb.
>> This design can remove “double write” in objectstore, but in our case,
>> NVRAM can not be utilized fully.
>>
>>  So, my questions are that
>>
>> 1. Can bluestore write WAL first as filestore?
>
> You can do it indirectly with bluestore_min_alloc_size=65536, which will
> send anything smaller than this value through the wal path.  Please let
> us know what effect this has on our latency/performance!
>
>> 2. If not, using bcache or flashcache for NVRAM on top of SSDs is right
>> answer?
>
> This is also possible, but I expect we'd like to make this work out of the
> box if we can!
>
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux