On Sun, May 31, 2020 at 11:46:49PM +0200, Lorenzo Bianconi wrote: > + > + prog = READ_ONCE(rcpu->prog); > for (i = 0; i < n; i++) { > - void *f = frames[i]; > + void *f = xdp_frames[i]; > struct page *page = virt_to_page(f); > + struct xdp_frame *xdpf; > + struct xdp_buff xdp; > + u32 act; > + int err; > > /* Bring struct page memory area to curr CPU. Read by > * build_skb_around via page_is_pfmemalloc(), and when > * freed written by page_frag_free call. > */ > prefetchw(page); > + if (!prog) { > + frames[nframes++] = xdp_frames[i]; > + continue; > + } I'm not sure compiler will be smart enough to hoist !prog check out of the loop. Otherwise default cpumap case will be a bit slower. I'd like to see performance numbers before/after and acks from folks who are using cpumap before applying. Also please add selftest for it. samples/bpf/ in patch 6 is not enough. Other than the above the feature looks good to me. It nicely complements devmap.