Re: hackathon recap

Matt Benjamin <mbenjamin@xxxxxxxxxx> · Mon, 4 Apr 2016 13:38:12 -0400 (EDT)

Hi,

----- Original Message -----
> From: "Haomai Wang" <haomaiwang@xxxxxxxxx>
> To: "Sage Weil" <sweil@xxxxxxxxxx>
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Sent: Monday, April 4, 2016 11:30:54 AM
> Subject: Re: hackathon recap
> 
> On Mon, Apr 4, 2016 at 11:25 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> > On Mon, 4 Apr 2016, Haomai Wang wrote:
> >> > - Multi-stream SSDs and GC control APIs
> >> >
> >> > Jianjian presented about new APIs to contorl when the SSD is doing
> >> > garbage
> >> > collection (stop, start, start but suspend on IO) and streams to
> >> > segregate
> >> > writes into different erase blocks.
> >>
> >> Where these new apis from? for specified vendor?
> >
> > They are working their way through the standards bodies (for NVMe and
> > SAS/SATA?).  The are probably available in some form from specific
> > vendors (Samsung) now?
> >
> >> > - DPDK and SPDK
> >> >
> >> > We spent a lot of time going over some background about what DPDK and
> >> > SPDK
> >> > do and don't do.  Takeaways/questions include
> >> >
> >> >  - Which TCP stack are we using with Haomai's DPDK AsyncMessenger
> >> > integration?  Should we support multiple options?
> >>
> >> yes, it will be a backend of AsyncMessenger. Just like impled options:
> >> ms type = async
> >> ms async transport type = dpdk
> >> ms dpdk host ipv4 addr = 10.253.102.119
> >> ms dpdk gateway ipv4 addr = 10.253.102.1
> >> ms dpdk netmask ipv4 addr = 255.255.255.0
> >>
> >> these options will enable dpdk backend.
> >>
> >> Current, I don't find any problem between kernel tcp/ip stack with
> >> dpdk userspace tcp/ip. Even passed test_msgr which injects lots of
> >> errors.
> >
> > Which userspace tcp/ip stack is it?  Seastar?  ODP?
> 
> main part from seastar

In ganesha upstream, there is interest in other stacks (mtcp, odp), and they have different capabilities and integration options.  Eventually, we'll want to flex this.

> 
> >
> >> >  - How much benefit should we expect?  Current estimate (based on
> >> > SanDisk's numbers) were that each op consumes around 250us of CPU time,
> >> > about 80 of that is actual IO time on an NVMe device, and the max time
> >> > we're likely to cut from bypassing the kernel block stack is on the
> >> > order
> >> > of 20-30us.  Successful users of DPDK/SPDK benefit mostly from
> >> > restructuring the rest of the stack to avoid legacy threading models.

Thats true.  Those reoganizations are critical in general.

> >>
> >> From now, because of some known bottlenecks need to solve. The most
> >> advantage is combining dpdk and spdk which actually make
> >> spdk(userspace nvme driver) effective. Because spdk still use poll
> >> mode and require physical address while queuing io request. Without
> >> dpdk, we always need to alloc a physical address aware memory and do
> >> copy. With dpdk as network stack, we could use memory from NIC to SSD.
> >> Thanks to the dpdk mbuf design, dpdk stack permit a lots of inflight
> >> mbuf and alloc new memory if hungry.
> >

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html