On Mon, Sep 30, 2019 at 5:17 AM Eelco Chaudron <echaudro@xxxxxxxxxx> wrote: > > > > On 30 Sep 2019, at 13:02, Magnus Karlsson wrote: > > > On Mon, Sep 30, 2019 at 11:28 AM Eelco Chaudron <echaudro@xxxxxxxxxx> > > wrote: > >> > >> > >> > >> On 30 Sep 2019, at 8:51, Magnus Karlsson wrote: > >> > >>> On Fri, Sep 27, 2019 at 8:09 PM William Tu <u9012063@xxxxxxxxx> > >>> wrote: > >>>> > >>>> On Fri, Sep 27, 2019 at 12:02 AM Magnus Karlsson > >>>> <magnus.karlsson@xxxxxxxxx> wrote: > >>>>> > >>>>> On Thu, Sep 26, 2019 at 1:34 AM William Tu <u9012063@xxxxxxxxx> > >>>>> wrote: > >>>>>> > >>>>>> On Wed, Sep 25, 2019 at 12:48 AM Eelco Chaudron > >>>>>> <echaudro@xxxxxxxxxx> wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 25 Sep 2019, at 8:46, Július Milan wrote: > >>>>>>> > >>>>>>>> Hi Eelco > >>>>>>>> > >>>>>>>>> Currently, OVS uses the mmaped memory directly, however on > >>>>>>>>> egress, it > >>>>>>>>> is copying the memory to the egress interface it’s mmaped > >>>>>>>>> memory. > >>>>>>>> Great, thanks for making this clear to me. > >>>>>>>> > >>>>>>>>> Currently, OVS uses an AF_XDP memory pool per interface, so a > >>>>>>>>> further > >>>>>>>>> optimization could be to use a global memory pool so this > >>>>>>>>> extra > >>>>>>>>> copy > >>>>>>>>> is not needed. > >>>>>>>> Is it even possible to make this further optimization? Since > >>>>>>>> every > >>>>>>>> interface has it's own non-shared umem, so from my point of > >>>>>>>> view, > >>>>>>>> at > >>>>>>>> least one > >>>>>>>> copy for case as you described above (when RX interface is > >>>>>>>> different > >>>>>>>> then TX interface) is necessery. Or am I missing something? > >>>>>>> > >>>>>>> Some one @Intel told me it would be possible to have one huge > >>>>>>> mempool > >>>>>>> that can be shared between interfaces. However I have not > >>>>>>> researched/tried it. > >>>>>> > >>>>>> I thought about it before, but the problem is cq and fq are > >>>>>> per-umem. > >>>>>> So when having only one umem shared with many queues or devices, > >>>>>> each one has to acquire a lock, then they can access cq or fq. I > >>>>>> think > >>>>>> that might become much slower. > >>>>> > >>>>> You basically have to implement a mempool that can be used by > >>>>> multiple > >>>>> processes. Unfortunately, there is no lean and mean standalone > >>>>> implementation of a mempool. There is a good one in DPDK, but then > >>>>> you > >>>>> get the whole DPDK package into your application which is likely > >>>>> what > >>>>> you wanted to avoid in the first place. Anyone for writing > >>>>> libmempool? > >>>>> > >>>>> /Magnus > >>>>> > >>>> > >>>> That's interesting. > >>>> Do you mean the DPDK's rte_mempool which supports > >>>> multiple-producer? > >>> > >>> Yes. > >>> > >>>> If I create a shared umem for queue1 and queue2, then each queue > >>>> has > >>>> its > >>>> own tx/rx ring so they can process in parallel. But for handling > >>>> the > >>>> per-umem > >>>> cq/fq, I can create a dedicated thread to process cq/fq. > >>>> So for example: > >>>> Thread 1 for handling cq/fq > >>>> Thread 2 for processing queue1 tx/rx queue > >>>> Thread 3 for processing queue2 tx/rx queue > >>>> and the mempool should allow multiple producer and consumer. > >>>> > >>>> Does this sound correct? > >>> > >>> You do not need a dedicated process. Just something in the mempool > >>> code that enforces mutual exclusion (a mutex or whatever) between > >>> thread 2 and 3 when they are performing operations on the mempool. > >>> Going with a dedicated process sounds complicated. > >> > >> I was trying to see how to experiment with this using libbpf, but > >> looks > >> like it’s not yet supported? > >> > >> Is see the following in xsk_socket__create(): > >> > >> 475 if (umem->refcount) { > >> 476 pr_warning("Error: shared umems not supported by > >> libbpf.\n"); > >> 477 return -EBUSY; > >> 478 } > >> > > > > Using the XDP_SHARED_UMEM option is not supported in libbpf at this > > point in time. In this mode you share a single umem with a single > > completion queue and a single fill queue among many xsk sockets tied > > to the same queue id. But note that you can register the same umem > > area multiple times (creating multiple umem handles and multiple fqs > > and cqs) to be able to support xsk sockets that have different queue > > ids, but the same umem area. In both cases you need a mempool that can > > handle multiple threads. > > Cool, this was not clear, and is what would fit better than the shared > fqs/cqs. > > William this would be an interesting option for OVS to support zero > memcpy on tx. Great, much clear to me now. I will take a look! William