On Thu, Dec 17, 2020 at 5:30 PM David Ahern <dsahern@xxxxxxxxx> wrote: > > On 12/16/20 3:53 PM, Alexander Duyck wrote: > > The problem in my case was based on a past experience where east-west > > traffic became a problem and it was easily shown that bypassing the > > NIC for traffic was significantly faster. > > If a deployment expects a lot of east-west traffic *within a host* why > is it using hardware based isolation like a VF. That is a side effect of > a design choice that is remedied by other options. I am mostly talking about this from past experience as I had seen a few instances when I was at Intel when it became an issue. Sales and marketing people aren't exactly happy when you tell them "don't sell that" in response to them trying to sell a feature into an area where it doesn't belong. Generally they want a solution. The macvlan offload addressed these issues as the replication and local switching can be handled in software. The problem is PCIe DMA wasn't designed to function as a network switch fabric and when we start talking about a 400Gb NIC trying to handle over 256 subfunctions it will quickly reduce the receive/transmit throughput to gigabit or less speeds when encountering hardware multicast/broadcast replication. With 256 subfunctions a simple 60B ARP could consume more than 19KB of PCIe bandwidth due to the packet having to be duplicated so many times. In my mind it should be simpler to simply clone a single skb 256 times, forward that to the switchdev ports, and have them perform a bypass (if available) to deliver it to the subfunctions. That's why I was thinking it might be a good time to look at addressing it.