Re: latency compare between 2t NVME SSD P3500 and bluestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 12 Jul 2017, 攀刘 wrote:
> Hi Sage,
> 
> Yes, I totally understand bluestore did much more things than a raw
> disk, but the current overhead is a little too big to our usage. I
> will compare bluestore with XFS(also has metadata tracking,
> allocation, and so on), and to see if XFS also has such impact.
> 
> I would like to give a flamegraph later, but from the perfcounter, we
> could find most of time were spent in "kv_lat".

That's rocksdb.  And yeah, I think it's pretty clear that either rocksdb 
needs some serious work to really keep up with nvme (or optane) or (more 
likely) we need an alternate kv backend that is targetting high speed 
flash.  I suspect the latter makes the most sense, and I believe there are 
various efforts at Intel looking at alternatives but no winner just yet.

Looking a bit further out, I think a new kv library that natively targets 
peristent memory (e.g., something built on pmem.io) will be the right 
solution.  Although at that point, it's probbaly a question of whether we 
have pmem for metadata and 3D NAND for data or pure pmem; in the latter 
case a complete replacement for bluestore would make more sense.

> For FTL, yes, it is a good idea, after we get the flame graph, we
> could discuss which part could be improved by FTL, firmware, even open
> channel.

Yep!
sage




> 
> 
> 
> 
> 
> 2017-07-12 20:02 GMT+08:00 Sage Weil <sage@xxxxxxxxxxxx>:
> > On Wed, 12 Jul 2017, 攀刘 wrote:
> >> Hi Cephers,
> >>
> >> I did some experiment today to compare the latency between one
> >> P3500(2T nvme SSD) and bluestore(fio + libfio_objectstore.so):
> >>
> >> For iodepth = 1, the random write latency of bluestore is 276.91us,
> >> compare with 14.71 of SSD, big overhead.
> >>
> >> I also test iodepth = 16, Still, there is a big overhead.(143 us -> 642 us)
> >>
> >> What is your opinion?
> >
> > There is a lot of work that bluestore is doing over the raw device as it
> > is implementing all of the metadata tracking, checksumming, allocation,
> > and so on.  There's definitely lots of room for improvement, but I'm
> > not sure you can expect to see latencies in the 10s of us.  That said, it
> > would be interesting to see an updated flamegraph to see where the time is
> > being spent and where we can slim this down.  On a new nvme it's possible
> > we can do away with some of the complexity of, say, the allocator, since
> > the FTL is performing a lot of the same work anyway.
> >
> > sage
> 
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux