Re: Cache Layer on NVME driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



sorry, 100% rw

On Thu, May 19, 2016 at 1:34 PM, Haomai Wang <haomai@xxxxxxxx> wrote:
> 100%, because of it's a small rbd image. all metadata should be cached.
>
> On Thu, May 19, 2016 at 10:39 AM, Jianjian Huo <jianjian.huo@xxxxxxxxxxx> wrote:
>>
>> On Wed, May 18, 2016 at 6:35 PM, Haomai Wang <haomai@xxxxxxxx> wrote:
>>> On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@xxxxxxxxxxx> wrote:
>>>>
>>>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>>>>>
>>>>> Hi Rajath!
>>>>>
>>>>> Great to hear you're interested in working with us outside of GSoC!
>>>>>
>>>>> On Wed, 18 May 2016, Haomai Wang wrote:
>>>>> > Hi Rajath,
>>>>> >
>>>>> > We are glad to see your passion, from my view, sage is planning to
>>>>> > implement a userspace cache in bluestore itself. Like
>>>>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>>> >
>>>>> > I guess the cache won't be a generic cache interface. Instead it will
>>>>> > be bound to specified needed object. So sage may give a brief?
>>>>>
>>>>> Part of the reason why this project wasn't at the top of our list (we got
>>>>> fewer slots than we had projects) was because the BlueStore code is in
>>>>> flux and moving quite quickly.  For the BlueStore side, we are building a
>>>>> simple buffer cache that is tied to an Onode (in-memory per-object
>>>>> metadata structure) and integrated tightly with the read and write IO
>>>>> paths.  This will eliminate our use of the block device buffer cache for
>>>>> user/object data.
>>>>>
>>>>> The other half of the picture, though, is the BlueFS layer that is
>>>>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>>>>> at all.  My hope is that the code we write for the use data can be re-used
>>>>> here as well, but it is still evolving.
>>>>
>>>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>>>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>>>> since Bluestore don't compress meta data in RocksDB.
>>>
>>> Actually this is not behaviored as expected. From my last nvmedevice
>>> benchmark, lots of read still go down device instead of caching by
>>> rocksdb when I set a very large block cache. I guess there exists some
>>> gaps between our usages and rocksdb implementation
>>
>> What  kind of workload did you use for that benchmarking, 100% read?
>>
>>>
>>>>
>>>> Jianjian
>>>>>
>>>>> The main missing piece I'd say is a way to string Buffer objects together
>>>>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>>>>> policy that makes sense) so that trimming can be done safely and
>>>>> efficiently.  Right now the code is lock-free because each Onode is only
>>>>> touched under the collection rwlock, but in order to do trimming we need
>>>>> to be able to reap cold buffers from a global context.
>>>>>
>>>>> Anyway, there is no clear or ready answer here yet, but we are ready to
>>>>> discuss design/approach here on the list, and welcome your input (and
>>>>> potentially, contributions to development!).
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>> >
>>>>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xxxxxxxx> wrote:
>>>>> > >
>>>>> > >
>>>>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>>>>> > > <rajath.shashidhara@xxxxxxxxx> wrote:
>>>>> > >>
>>>>> > >> Hello,
>>>>> > >>
>>>>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>>>>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>>>>> > >> internship.
>>>>> > >
>>>>> > >
>>>>> > > Hi Rajath,
>>>>> > >
>>>>> > > We are glad to see your passion, from my view, sage is planning to implement
>>>>> > > a userspace cache in bluestore itself. Like
>>>>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>>> > >
>>>>> > > I guess the cache won't be a generic cache interface. Instead it will be
>>>>> > > bound to specified needed object. So sage may give a brief?
>>>>> > >
>>>>> > >>
>>>>> > >>
>>>>> > >> However, I would be interested in working on the project as an
>>>>> > >> independent contributor to Ceph.
>>>>> > >> I am expecting to receive the necessary support from Ceph developer
>>>>> > >> community.
>>>>> > >>
>>>>> > >> In case I missed out any important details in my project proposal or I
>>>>> > >> have the wrong understanding of the project, please help me figure out
>>>>> > >> the details.
>>>>> > >>
>>>>> > >> Looking forward to contribute!
>>>>> > >>
>>>>> > >> Thank you,
>>>>> > >> Rajath Shashidhara
>>>>> > >
>>>>> > >
>>>>> > --
>>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> >
>>>>> >
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux