Re: AF_XDP integration with FDio VPP? (Was: Questions about XDP)

"Eelco Chaudron" <echaudro@xxxxxxxxxx> · Mon, 30 Sep 2019 11:28:08 +0200

On 30 Sep 2019, at 8:51, Magnus Karlsson wrote:

On Fri, Sep 27, 2019 at 8:09 PM William Tu <u9012063@xxxxxxxxx> wrote:

On Fri, Sep 27, 2019 at 12:02 AM Magnus Karlsson
<magnus.karlsson@xxxxxxxxx> wrote:

On Thu, Sep 26, 2019 at 1:34 AM William Tu <u9012063@xxxxxxxxx> 

wrote:

On Wed, Sep 25, 2019 at 12:48 AM Eelco Chaudron 

<echaudro@xxxxxxxxxx> wrote:

On 25 Sep 2019, at 8:46, Július Milan wrote:

Hi Eelco

Currently, OVS uses the mmaped memory directly, however on 

egress, it

is copying the memory to the egress interface it’s mmaped 

memory.

Great, thanks for making this clear to me.

Currently, OVS uses an AF_XDP memory pool per interface, so a 

further

optimization could be to use a global memory pool so this extra 

copy

is not needed.

Is it even possible to make this further optimization? Since 

every

interface has it's own non-shared umem, so from my point of view, 

at

least one

copy for case as you described above (when RX interface is 

different

then TX interface) is necessery. Or am I missing something?

Some one @Intel told me it would be possible to have one huge 

mempool

that can be shared between interfaces. However I have not
researched/tried it.

I thought about it before, but the problem is cq and fq are 

per-umem.

So when having only one umem shared with many queues or devices,

each one has to acquire a lock, then they can access cq or fq. I 

think

that might become much slower.

You basically have to implement a mempool that can be used by 

multiple

processes. Unfortunately, there is no lean and mean standalone

implementation of a mempool. There is a good one in DPDK, but then 

you

get the whole DPDK package into your application which is likely 

what

you wanted to avoid in the first place. Anyone for writing 

libmempool?

/Magnus

That's interesting.
Do you mean the DPDK's rte_mempool which supports multiple-producer?

Yes.

If I create a shared umem for queue1  and queue2, then each queue has 

its

own tx/rx ring so they can process in parallel. But for handling the 

per-umem

cq/fq, I can create a dedicated thread to process cq/fq.
So for example:
Thread 1 for handling cq/fq
Thread 2 for processing queue1 tx/rx queue
Thread 3 for processing queue2 tx/rx queue
and the mempool should allow multiple producer and consumer.

Does this sound correct?

You do not need a dedicated process. Just something in the mempool
code that enforces mutual exclusion (a mutex or whatever) between
thread 2 and 3 when they are performing operations on the mempool.
Going with a dedicated process sounds complicated.

I was trying to see how to experiment with this using libbpf, but looks 

like it’s not yet supported?

Is see the following in xsk_socket__create():

475         if (umem->refcount) {

476                 pr_warning("Error: shared umems not supported by 

libbpf.\n");

477                 return -EBUSY;
478         }