On Mon, 4 Apr 2016, Haomai Wang wrote: > > - Multi-stream SSDs and GC control APIs > > > > Jianjian presented about new APIs to contorl when the SSD is doing garbage > > collection (stop, start, start but suspend on IO) and streams to segregate > > writes into different erase blocks. > > Where these new apis from? for specified vendor? They are working their way through the standards bodies (for NVMe and SAS/SATA?). The are probably available in some form from specific vendors (Samsung) now? > > - DPDK and SPDK > > > > We spent a lot of time going over some background about what DPDK and SPDK > > do and don't do. Takeaways/questions include > > > > - Which TCP stack are we using with Haomai's DPDK AsyncMessenger > > integration? Should we support multiple options? > > yes, it will be a backend of AsyncMessenger. Just like impled options: > ms type = async > ms async transport type = dpdk > ms dpdk host ipv4 addr = 10.253.102.119 > ms dpdk gateway ipv4 addr = 10.253.102.1 > ms dpdk netmask ipv4 addr = 255.255.255.0 > > these options will enable dpdk backend. > > Current, I don't find any problem between kernel tcp/ip stack with > dpdk userspace tcp/ip. Even passed test_msgr which injects lots of > errors. Which userspace tcp/ip stack is it? Seastar? ODP? > > - How much benefit should we expect? Current estimate (based on > > SanDisk's numbers) were that each op consumes around 250us of CPU time, > > about 80 of that is actual IO time on an NVMe device, and the max time > > we're likely to cut from bypassing the kernel block stack is on the order > > of 20-30us. Successful users of DPDK/SPDK benefit mostly from > > restructuring the rest of the stack to avoid legacy threading models. > > From now, because of some known bottlenecks need to solve. The most > advantage is combining dpdk and spdk which actually make > spdk(userspace nvme driver) effective. Because spdk still use poll > mode and require physical address while queuing io request. Without > dpdk, we always need to alloc a physical address aware memory and do > copy. With dpdk as network stack, we could use memory from NIC to SSD. > Thanks to the dpdk mbuf design, dpdk stack permit a lots of inflight > mbuf and alloc new memory if hungry. Is this a new buffer::raw type? > The current status is osd could boot up with serveal dedicated dpdk > network thread, and the last one run spdk polling. I really want to > make make dpdk network thread can take over > OSD::ShardedOpWQ::_process(just discard thread pool and let dpdk > thread poll this function) and let each dpdk thread own a shard of > PGs, then poll BlueStore::kv_thread, and last for spdk completion reap > threads. The main gap now is: > 1. Signal/Wait pair .... > 2. Async Read ... > 3. discard potential slowness in fast path. > > If so, lots of locks could be discarded. My initial idea is not change > so much in current path. But it seemed the Signal/Wait and async read > couldn't bypass. Look forward to future/promise and async read :-) This would be very cool. Sam, I assume/hope this aligns with what you're working on? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html