Re: hackathon recap

Haomai Wang <haomaiwang@xxxxxxxxx> · Mon, 4 Apr 2016 23:30:54 +0800



On Mon, Apr 4, 2016 at 11:25 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Mon, 4 Apr 2016, Haomai Wang wrote:
>> > - Multi-stream SSDs and GC control APIs
>> >
>> > Jianjian presented about new APIs to contorl when the SSD is doing garbage
>> > collection (stop, start, start but suspend on IO) and streams to segregate
>> > writes into different erase blocks.
>>
>> Where these new apis from? for specified vendor?
>
> They are working their way through the standards bodies (for NVMe and
> SAS/SATA?).  The are probably available in some form from specific
> vendors (Samsung) now?
>
>> > - DPDK and SPDK
>> >
>> > We spent a lot of time going over some background about what DPDK and SPDK
>> > do and don't do.  Takeaways/questions include
>> >
>> >  - Which TCP stack are we using with Haomai's DPDK AsyncMessenger
>> > integration?  Should we support multiple options?
>>
>> yes, it will be a backend of AsyncMessenger. Just like impled options:
>> ms type = async
>> ms async transport type = dpdk
>> ms dpdk host ipv4 addr = 10.253.102.119
>> ms dpdk gateway ipv4 addr = 10.253.102.1
>> ms dpdk netmask ipv4 addr = 255.255.255.0
>>
>> these options will enable dpdk backend.
>>
>> Current, I don't find any problem between kernel tcp/ip stack with
>> dpdk userspace tcp/ip. Even passed test_msgr which injects lots of
>> errors.
>
> Which userspace tcp/ip stack is it?  Seastar?  ODP?

main part from seastar

>
>> >  - How much benefit should we expect?  Current estimate (based on
>> > SanDisk's numbers) were that each op consumes around 250us of CPU time,
>> > about 80 of that is actual IO time on an NVMe device, and the max time
>> > we're likely to cut from bypassing the kernel block stack is on the order
>> > of 20-30us.  Successful users of DPDK/SPDK benefit mostly from
>> > restructuring the rest of the stack to avoid legacy threading models.
>>
>> From now, because of some known bottlenecks need to solve. The most
>> advantage is combining dpdk and spdk which actually make
>> spdk(userspace nvme driver) effective. Because spdk still use poll
>> mode and require physical address while queuing io request. Without
>> dpdk, we always need to alloc a physical address aware memory and do
>> copy. With dpdk as network stack, we could use memory from NIC to SSD.
>> Thanks to the dpdk mbuf design, dpdk stack permit a lots of inflight
>> mbuf and alloc new memory if hungry.
>
> Is this a new buffer::raw type?

Yes

>
>> The current status is osd could boot up with serveal dedicated dpdk
>> network thread, and the last one run spdk polling. I really want to
>> make make dpdk network thread can take over
>> OSD::ShardedOpWQ::_process(just discard thread pool and let dpdk
>> thread poll this function) and let each dpdk thread own a shard of
>> PGs, then poll BlueStore::kv_thread, and last for spdk completion reap
>> threads. The main gap now is:
>> 1. Signal/Wait pair ....
>> 2. Async Read ...
>> 3. discard potential slowness in fast path.
>>
>> If so, lots of locks could be discarded. My initial idea is not change
>> so much in current path. But it seemed the Signal/Wait and async read
>> couldn't bypass. Look forward to future/promise and async read :-)
>
> This would be very cool.  Sam, I assume/hope this aligns with what you're
> working on?
>
> sage


-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html