On Tue, Nov 28, 2017 at 6:02 AM, Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote: > > On Mon, 27 Nov 2017 18:33:10 -0500 "MD I. Islam" <tamim@xxxxxxxxxxx> wrote: > >> I was wondering if XDP can scale to many-core NPU (such as NPS-400 >> which has 256 cores)? I need to develop a XCP/RCP like application >> that can achieve bare-metal performance on each core. The application >> will run in a run-to-completion model. I see, DPDK can run userspace >> application on each core. I'm wondering if XDP has anything like that? >> Please let me know any suggestion. > > Hi Tamim, > > I think you are mixing up things a bit here... > > You mention a specific NIC (NPS-400) which have many cores inside the > NIC. You need to understand XDP is a software solution, where the > programming language is eBPF. XDP does NOT run inside the NIC, instead > XDP runs as the earliest possible step in the Linux kernel network stack. > > The only NIC that does hardware offloading of XDP is Netronome[1], see > their white papers[2]. Hi Jesper I was looking at http://events.linuxfoundation.org/sites/events/files/slides/Massively_Multi-Core_LPC_2013.pdf. It looks like the NPS-400 NIC also runs an embedded Linux itself. The packets are processed by the embedded ARC processor. Packets processing however is done at userspace. They also use DPDK-like framework OpenNPU/NPS SDK to bypass the kernel. Is it possible to achieve something similar to using XDP? Please let me know if I'm getting anything wrong. I'm not sure if it is possible for me (a third party developer/PhD student) to load a customized Linux on the their NIC. > > [1] https://www.netronome.com/ > [2] https://open-nfp.org/dataplanes-ebpf/technical-papers/ > > Regarding scaling: XDP scales perfect for each added CPU core. XDP is > currently (footnote-1) loaded on for entire NIC, but the XDP/eBPF > program is executed separate/independent on each NIC RX-ring queue > (processing up-to 64 frames per NAPI poll cycle). > > The XDP scaling depend on how well the NIC RSS distribute traffic > across RX-ring queues, which is also true for the normal kernel network > stack. To address bad RSS distribution, I recently implement cpumap[3] > to allow XDP to scale delivery to the normal kernel network > stack. See sample code[4][5] on how to use it. I was not looking to offload eBPF program from control plane. I would rather like to program the dataplane by modifying the embedded Linux. I'm wondering if I can create kernel thread and pin them on each core and having XDP to provide the thread with packets. Please let me know any suggestion. > [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cpumap.c > [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/xdp_redirect_cpu_kern.c > [5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/xdp_redirect_cpu_user.c > > > (footnote-1: there are debates regarding loading XDP/eBPF progs on > specific RX-queue numbers, so this might change.) > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer Many thanks Tamim PhD Candidate Kent State University http://web.cs.kent.edu/~mislam4/