Re: status of spdk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 9, 2016 at 8:21 AM, LIU, Fei <james.liu@xxxxxxxxxxxxxxx> wrote:
> Hi Yehuda and Haomai,
>    The issue of drives driven by SPDK is not able to be shared by multiple OSDs as kernel NVMe drive since SPDK as a process so far can not be shared across multiple processes like OSDs, right?

spdk nvme supports multi process is a undergoing spdk feature now, it
will be implemented via shared memory among multi process.

>
>    Regards,
>    James
>
>
>
> On 11/8/16, 4:06 PM, "Yehuda Sadeh-Weinraub" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of yehuda@xxxxxxxxxx> wrote:
>
>     On Tue, Nov 8, 2016 at 3:40 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>     > On Tue, 8 Nov 2016, Yehuda Sadeh-Weinraub wrote:
>     >> I just started looking at spdk, and have a few comments and questions.
>     >>
>     >> First, it's not clear to me how we should handle build. At the moment
>     >> the spdk code resides as a submodule in the ceph tree, but it depends
>     >> on dpdk, which currently needs to be downloaded separately. We can add
>     >> it as a submodule (upstream is here: git://dpdk.org/dpdk). That been
>     >> said, getting it to build was a bit tricky and I think it might be
>     >> broken with cmake. In order to get it working I resorted to building a
>     >> system library and use that.
>     >
>     > Note that this PR is about to merge
>     >
>     >         https://github.com/ceph/ceph/pull/10748
>     >
>     > which adds the DPDK submodule, so hopefully this issue will go away when
>     > that merged or with a follow-on cleanup.
>     >
>     >> The way to currently configure an osd to use bluestore with spdk is by
>     >> creating a symbolic link that replaces the bluestore 'block' device to
>     >> point to a file that has a name that is prefixed with 'spdk:'.
>     >> Originally I assumed that the suffix would be the nvme device id, but
>     >> it seems that it's not really needed, however, the file itself needs
>     >> to contain the device id (see
>     >> https://github.com/yehudasa/ceph/tree/wip-yehuda-spdk for a couple of
>     >> minor fixes).
>     >
>     > Open a PR for those?
>
>     Sure
>
>     >
>     >> As I understand it, in order to support multiple osds on the same NVMe
>     >> device we have a few options. We can leverage NVMe namespaces, but
>     >> that's not supported on all devices. We can configure bluestore to
>     >> only use part of the device (device sharding? not sure if it supports
>     >> it). I think it's best if we could keep bluestore out of the loop
>     >> there and have the NVMe driver abstract multiple partitions of the
>     >> NVMe device. The idea is to be able to define multiple partitions on
>     >> the device (e.g., each partition will be defined by the offset, size,
>     >> and namespace), and have the osd set to use a specific partition.
>     >> We'll probably need a special tool to manage it, and potentially keep
>     >> the partition table information on the device itself. The tool could
>     >> also manage the creation of the block link. We should probably rethink
>     >> how the link is structure and what it points at.
>     >
>     > I agree that bluestore shouldn't get involved.
>     >
>     > Is the NVMe namespaces meant to support multiple processes sharing the
>     > same hardware device?
>
>     More of a partitioning solution, but yes (as far as I undestand).
>
>     >
>     > Also, if you do that, is it possible to give one of the namespaces to the
>     > kernel?  That might solve the bootstrapping problem we currently have
>
>     Theoretically, but not right now (or ever?). See here:
>
>     https://lists.01.org/pipermail/spdk/2016-July/000073.html
>
>     > where we have nowhere to put the $osd_data filesystem with the device
>     > metadata.  (This is admittedly not necessarily a blocking issue.  Putting
>     > those dirs on / wouldn't be the end of the world; it just means cards
>     > can't be easily moved between boxes.)
>     >
>
>     Maybe we can use bluestore for these too ;) that been said, there
>     might be some kind of a loopback solution that could work, but not
>     sure if it won't create major bottlenecks that we'd want to avoid.
>
>     Yehuda
>     --
>     To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>     the body of a message to majordomo@xxxxxxxxxxxxxxx
>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux