On Tue, Aug 15, 2023 at 9:38 AM David Laight <David.Laight@xxxxxxxxxx> wrote: > > From: Mina Almasry > > Sent: 10 August 2023 02:58 > ... > > * TL;DR: > > > > Device memory TCP (devmem TCP) is a proposal for transferring data to and/or > > from device memory efficiently, without bouncing the data to a host memory > > buffer. > > Doesn't that really require peer-to-peer PCIe transfers? > IIRC these aren't supported by many root hubs and have > fundamental flow control and/or TLP credit problems. > > I'd guess they are also pretty incompatible with IOMMU? Yes, this is a form of PCI_P2PDMA and all the limitations of that apply. > I can see how you might manage to transmit frames from > some external memory (eg after encryption) but surely > processing receive data that way needs the packets > be filtered by both IP addresses and port numbers before > being redirected to the (presumably limited) external > memory. This feature depends on NIC receive header split. The TCP/IP headers are stored to host memory, the payload to device memory. Optionally, on devices that do not support explicit header-split, but do support scatter-gather I/O, if the header size is constant and known, that can be used as a weak substitute. This has additional caveats wrt unexpected traffic for which payload must be host visible (e.g., ICMP). > OTOH isn't the kernel going to need to run code before > the packet is actually sent and just after it is received? > So all you might gain is a bit of latency? > And a bit less utilisation of host memory?? > But if your system is really limited by cpu-memory bandwidth > you need more cache :-) > > > So how much benefit is there over efficient use of host > memory bounce buffers?? Among other things, on a PCIe tree this makes it possible to load up machines with many NICs + GPUs.