On Fri, Aug 25, 2017 at 8:10 AM, John Fastabend <john.fastabend@xxxxxxxxx> wrote: > On 08/25/2017 05:45 AM, Jesper Dangaard Brouer wrote: >> On Thu, 24 Aug 2017 20:36:28 -0700 >> Michael Chan <michael.chan@xxxxxxxxxxxx> wrote: >> >>> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer >>> <brouer@xxxxxxxxxx> wrote: >>>> On Tue, 22 Aug 2017 23:59:05 -0700 >>>> Michael Chan <michael.chan@xxxxxxxxxxxx> wrote: >>>> >>>>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck >>>>> <alexander.duyck@xxxxxxxxx> wrote: >>>>>> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@xxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> Right, but it's conceivable to add an API to "return" the buffer to >>>>>>> the input device, right? >>>> >>>> Yes, I would really like to see an API like this. >>>> >>>>>> >>>>>> You could, it is just added complexity. "just free the buffer" in >>>>>> ixgbe usually just amounts to one atomic operation to decrement the >>>>>> total page count since page recycling is already implemented in the >>>>>> driver. You still would have to unmap the buffer regardless of if you >>>>>> were recycling it or not so all you would save is 1.000015259 atomic >>>>>> operations per packet. The fraction is because once every 64K uses we >>>>>> have to bulk update the count on the page. >>>>>> >>>>> >>>>> If the buffer is returned to the input device, the input device can >>>>> keep the DMA mapping. All it needs to do is to dma_sync it back to >>>>> the input device when the buffer is returned. >>>> >>>> Yes, exactly, return to the input device. I really think we should >>>> work on a solution where we can keep the DMA mapping around. We have >>>> an opportunity here to make ndo_xdp_xmit TX queues use a specialized >>>> page return call, to achieve this. (I imagine other arch's have a high >>>> DMA overhead than Intel) >>>> >>>> I'm not sure how the API should look. The ixgbe recycle mechanism and >>>> splitting the page (into two packets) actually complicates things, and >>>> tie us into a page-refcnt based model. We could get around this by >>>> each driver implementing a page-return-callback, that allow us to >>>> return the page to the input device? Then, drivers implementing the >>>> 1-packet-per-page can simply check/read the page-refcnt, and if it is >>>> "1" DMA-sync and reuse it in the RX queue. >>>> >>> >>> Yeah, based on Alex' description, it's not clear to me whether ixgbe >>> redirecting to a non-intel NIC or vice versa will actually work. It >>> sounds like the output device has to make some assumptions about how >>> the page was allocated by the input device. >> >> Yes, exactly. We are tied into a page refcnt based scheme. >> >> Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping) >> is also tied to the RX queue size, plus how fast the pages are returned. >> This makes it very hard to tune. As I demonstrated, default ixgbe >> settings does not work well with XDP_REDIRECT. I needed to increase >> TX-ring size, but it broke page recycling (dropping perf from 13Mpps to >> 10Mpps) so I also needed it increase RX-ring size. But perf is best if >> RX-ring size is smaller, thus two contradicting tuning needed. >> > > The changes to decouple the ixgbe page recycle scheme (1pg per descriptor > split into two halves being the default) from the number of descriptors > doesn't look too bad IMO. It seems like it could be done by having some > extra pages allocated upfront and pulling those in when we need another > page. > > This would be a nice iterative step we could take on the existing API. > >> >>> With buffer return API, >>> each driver can cleanly recycle or free its own buffers properly. >> >> Yes, exactly. And RX-driver can implement a special memory model for >> this queue. E.g. RX-driver can know this is a dedicated XDP RX-queue >> which is never used for SKBs, thus opening for new RX memory models. >> >> Another advantage of a return API. There is also an opportunity for >> avoiding the DMA map on TX. As we need to know the from-device. Thus, >> we can add a DMA API, where we can query if the two devices uses the >> same DMA engine, and can reuse the same DMA address the RX-side already >> knows. >> >> >>> Let me discuss this further with Andy to see if we can come up with a >>> good scheme. >> >> Sound good, looking forward to hear what you come-up with :-) >> > > I guess by this thread we will see a broadcom nic with redirect support > soon ;) Yes, Andy actually has finished the coding for XDP_REDIRECT, but the buffer recycling scheme has some problems. We can make it work for Broadcom to Broadcom only, but we want a better solution.