Re: single-threaded seastar-osd

Matt Benjamin <mbenjami@xxxxxxxxxx> · Wed, 9 Jan 2019 11:30:54 -0500

On Tue, Jan 8, 2019 at 8:32 PM Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> wrote:
>

<snipped makes-sense-to-me-stuff>
>
> Personally I perceive the OSD *concept* as networked ObjectStore instance
> exposed over the RADOS protocol.
>

I remain concerned that this is framing is too strong.  Recall that
well before the seastar-osd concept, several teams (mellanox, folks on
my team, Fujitsu/Piotr, and I think by Sam) have asked to flex in the
other direction--mapping a reduced number of network connections to
OSDs.  When infiniband rc is the transport with Mellanox connect-x3 or
-x4, each reliable connection consumes 1 queue pair, and there there
are 64K -total- qps available on the hca.  Solutions are in the
direction of ud or hybridizing with shared receive-queue.  I'm not
arguing message-passing/datagram orientation should somehow take
precedence, but I think we need to make space for those setups in what
we design now.  Taking any incidence of cross-core communication as an
intolerable event feels problematic for that?

>
> One of the fundamental benefits I see is keeping the RADOS name resolver
> intact. It still consists one level only: the CRUSH name resolution. No
> in-OSD crossbar is necessary. Therefore I expect no desire for a RADOS
> extension bypassing the new stage by memorizing the mapping it brings.
> That is, in addition to simplifying the crimson-osd design (stripping all
> seastar::sharded<...> and seastar::foreign_ptrs), there would be absolutely
> no modification to the protocol and clients. This means no need for a logic
> handling backward compatibility.

>
> > It's a fair point.  To also play devil's advocate: If you are storing
> > cache per OSD and the size of each cache grows with the number of OSDs,
> > what happens as the number of cores / node grows? Maybe we are ok with
> > current core counts.  Would we still be ok with 256+ cores in a single
> > node if the number of caches and the size of each cache grows together?
>
<snip>
>
> Regards,
> Radek
>
> [1] Micron ® 9200 MAX NVMeTM SSDs + Red Hat ® Ceph Storage 3.0,
> Reference Architecture

thanks,

Matt

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309