Re: status of spdk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Changpeng,
  Thanks a lot for your update.

  Regards,
  James

On 11/8/16, 9:09 PM, "Liu, Changpeng" <changpeng.liu@xxxxxxxxx> wrote:

    Hi James,
    
    Yes, the multi processes support of SPDK is under development, Gang is the developer for the feature of  SPDK.
    We are targeting to release the feature in 16.12 version for SPDK(WW50).
    
    
    > -----Original Message-----
    > From: LIU, Fei [mailto:james.liu@xxxxxxxxxxxxxxx]
    > Sent: Wednesday, November 9, 2016 1:03 PM
    > To: Haomai Wang <haomaiwang@xxxxxxxxx>; Liu, Changpeng
    > <changpeng.liu@xxxxxxxxx>
    > Cc: Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx>; Sage Weil
    > <sweil@xxxxxxxxxx>; ceph-devel <ceph-devel@xxxxxxxxxxxxxxx>
    > Subject: Re: status of spdk
    > 
    > Haomai,
    >    Thanks a lot.
    > 
    >    Regards,
    >    James
    > 
    > Hi Changpeng,
    >    Would you mind updating us about the status of multi processes support of
    > spdk?
    > 
    >    Regards,
    >    James
    > 
    > On 11/8/16, 8:59 PM, "Haomai Wang" <haomaiwang@xxxxxxxxx> wrote:
    > 
    >     On Wed, Nov 9, 2016 at 8:21 AM, LIU, Fei <james.liu@xxxxxxxxxxxxxxx> wrote:
    >     > Hi Yehuda and Haomai,
    >     >    The issue of drives driven by SPDK is not able to be shared by multiple OSDs
    > as kernel NVMe drive since SPDK as a process so far can not be shared across
    > multiple processes like OSDs, right?
    > 
    >     spdk nvme supports multi process is a undergoing spdk feature now, it
    >     will be implemented via shared memory among multi process.
    > 
    >     >
    >     >    Regards,
    >     >    James
    >     >
    >     >
    >     >
    >     > On 11/8/16, 4:06 PM, "Yehuda Sadeh-Weinraub" <ceph-devel-
    > owner@xxxxxxxxxxxxxxx on behalf of yehuda@xxxxxxxxxx> wrote:
    >     >
    >     >     On Tue, Nov 8, 2016 at 3:40 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
    >     >     > On Tue, 8 Nov 2016, Yehuda Sadeh-Weinraub wrote:
    >     >     >> I just started looking at spdk, and have a few comments and questions.
    >     >     >>
    >     >     >> First, it's not clear to me how we should handle build. At the moment
    >     >     >> the spdk code resides as a submodule in the ceph tree, but it depends
    >     >     >> on dpdk, which currently needs to be downloaded separately. We can
    > add
    >     >     >> it as a submodule (upstream is here: git://dpdk.org/dpdk). That been
    >     >     >> said, getting it to build was a bit tricky and I think it might be
    >     >     >> broken with cmake. In order to get it working I resorted to building a
    >     >     >> system library and use that.
    >     >     >
    >     >     > Note that this PR is about to merge
    >     >     >
    >     >     >         https://github.com/ceph/ceph/pull/10748
    >     >     >
    >     >     > which adds the DPDK submodule, so hopefully this issue will go away
    > when
    >     >     > that merged or with a follow-on cleanup.
    >     >     >
    >     >     >> The way to currently configure an osd to use bluestore with spdk is by
    >     >     >> creating a symbolic link that replaces the bluestore 'block' device to
    >     >     >> point to a file that has a name that is prefixed with 'spdk:'.
    >     >     >> Originally I assumed that the suffix would be the nvme device id, but
    >     >     >> it seems that it's not really needed, however, the file itself needs
    >     >     >> to contain the device id (see
    >     >     >> https://github.com/yehudasa/ceph/tree/wip-yehuda-spdk for a couple
    > of
    >     >     >> minor fixes).
    >     >     >
    >     >     > Open a PR for those?
    >     >
    >     >     Sure
    >     >
    >     >     >
    >     >     >> As I understand it, in order to support multiple osds on the same NVMe
    >     >     >> device we have a few options. We can leverage NVMe namespaces, but
    >     >     >> that's not supported on all devices. We can configure bluestore to
    >     >     >> only use part of the device (device sharding? not sure if it supports
    >     >     >> it). I think it's best if we could keep bluestore out of the loop
    >     >     >> there and have the NVMe driver abstract multiple partitions of the
    >     >     >> NVMe device. The idea is to be able to define multiple partitions on
    >     >     >> the device (e.g., each partition will be defined by the offset, size,
    >     >     >> and namespace), and have the osd set to use a specific partition.
    >     >     >> We'll probably need a special tool to manage it, and potentially keep
    >     >     >> the partition table information on the device itself. The tool could
    >     >     >> also manage the creation of the block link. We should probably rethink
    >     >     >> how the link is structure and what it points at.
    >     >     >
    >     >     > I agree that bluestore shouldn't get involved.
    >     >     >
    >     >     > Is the NVMe namespaces meant to support multiple processes sharing
    > the
    >     >     > same hardware device?
    >     >
    >     >     More of a partitioning solution, but yes (as far as I undestand).
    >     >
    >     >     >
    >     >     > Also, if you do that, is it possible to give one of the namespaces to the
    >     >     > kernel?  That might solve the bootstrapping problem we currently have
    >     >
    >     >     Theoretically, but not right now (or ever?). See here:
    >     >
    >     >     https://lists.01.org/pipermail/spdk/2016-July/000073.html
    >     >
    >     >     > where we have nowhere to put the $osd_data filesystem with the device
    >     >     > metadata.  (This is admittedly not necessarily a blocking issue.  Putting
    >     >     > those dirs on / wouldn't be the end of the world; it just means cards
    >     >     > can't be easily moved between boxes.)
    >     >     >
    >     >
    >     >     Maybe we can use bluestore for these too ;) that been said, there
    >     >     might be some kind of a loopback solution that could work, but not
    >     >     sure if it won't create major bottlenecks that we'd want to avoid.
    >     >
    >     >     Yehuda
    >     >     --
    >     >     To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
    >     >     the body of a message to majordomo@xxxxxxxxxxxxxxx
    >     >     More majordomo info at  http://vger.kernel.org/majordomo-info.html
    >     >
    >     >
    >     >
    > 
    > 
    > 
    >     --
    >     Best Regards,
    > 
    >     Wheat
    > 
    > 
    
    


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux