Re: Questions for NVRAM+SATA SSDs with Bluestore

Ning Yao <zay11022@xxxxxxxxx> · Sun, 3 Jul 2016 04:18:54 +0800

Hi, Sage
I have tried with bluestore_min_alloc_size = 4096 so that the updated
writes can always reallocate inito new extents. It avoids double write
theoretically, but with high speed device like nvme，it still has
performance issue with metadata updating and the bottleneck is
apparently in rocksdb. I think the compaction and data organization in
rocksdb may affect a lots.  It may have lots of works to do with
rocksdb and bluefs such as using different compaction strategies and
use less number of levels in rocksdb?　So any guides about those and
what  is the future directions on current bluestore performance issue?
Regards
Ning Yao

2016-06-27 20:31 GMT+08:00 Sage Weil <sage@xxxxxxxxxxxx>:
> On Mon, 27 Jun 2016, myoungwon oh wrote:
>> Hi, I have questions for bluestore (4K random write case).
>>
>> So far, we have used NVRAM(PCIe) as journal and SSD (SATA) as data
>> disk (filestore).
>> Therefore, we got performance gain from NVRAM journal.
>> However, current Bluestore design seems that data (4K aligned) is
>> written to data disk first, then metadata is written to WAL rocksdb.
>> This design can remove “double write” in objectstore, but in our case,
>> NVRAM can not be utilized fully.
>>
>>  So, my questions are that
>>
>> 1. Can bluestore write WAL first as filestore?
>
> You can do it indirectly with bluestore_min_alloc_size=65536, which will
> send anything smaller than this value through the wal path.  Please let
> us know what effect this has on our latency/performance!
>
>> 2. If not, using bcache or flashcache for NVRAM on top of SSDs is right
>> answer?
>
> This is also possible, but I expect we'd like to make this work out of the
> box if we can!
>
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html