Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag)

David Ahern <dsahern@xxxxxxxxxx> · Sun, 16 Jul 2023 21:08:16 -0600

On 7/16/23 8:05 PM, Mina Almasry wrote:
>>
>> For the driver and hardware queue: don't you need a dedicated queue for
>> the flow(s) in question?
> 
> In the RFC and the implementation I'm thinking of, the queue is
> 'dedicated' in that each queue will be a devmem TCP queue or a regular
> queue. devmem queues generate devmem skbs and non-devmem queues
> generate non-devmem skbs. We support switching queues between devmem
> mode and non-devmem mode via a uapi.

ethtool APIs or something else?

> 
>> If not, how can you properly handle the
>> teardown case (e.g., app crashes and you need to ensure all references
>> to GPU memory are removed from NIC descriptors)?
> 
> Jason and Christian will correct me if I'm wrong, but AFAICT the
> dma-buf API requires the dma-buf provider to keep the attachment
> mapping alive as long as the importer requires it. The dma-buf API
> gives the importer dma_buf_map_attachment() and
> dma_buf_unmap_attachment() APIs, but there is no callback for the
> exporter to inform the importer that it has to take the mapping away.

Isn't the importer that application that terminated (cleanly or other)?
That was my thinking but I guess there are other designs that can cross
a single application.

> The closest thing I saw was the move_notify() callback, but that is
> optional.
> 
> In my mind the way it works is that there will be some uapi that binds
> a dma-buf to an RX queue, that will create the attachment and the
> mapping. If the user crashes or closes the dma-buf handle then that
> will unbind the dma-buf from the RX queue, but the mapping will remain
> alive (via some refcounting) until all the NIC descriptors are freed
> and the mapping is not under use anymore. Usually this will happen
> next driver reset which destroys and recreates rx queues thereby
> freeing all the NIC descriptors (but could be a new API so that we
> don't rely on a driver reset).
> 
>> If you agree on this
>> point, then you can require the dedicated queue management in the driver
>> to use and expect only the alternative frag addressing scheme. ie., it
>> knows the address is not struct page (validates by checking skb flag or
>> frag flag or address magic), but a reference to say a page_pool entry
>> (if you are using page_pool for management of the dmabuf slices) which
>> contains the metadata needed for the use case.
> 
> Honestly if my understanding above doesn't match what you want, I
> could implement 'dedicated queues' instead, just let me know what you
> want at some future iteration. Now, I'm more worried about this memory
> format issue and I'm working on an RX prototype without struct pages.
> So far purely technically speaking it seems possible.
> 
> 

My comment was only a suggestion on how to simplify driver changes. ie.,
a queue is either pages (based on standard page_pool or alloc_pages) or
some "special" page_pool (ie., new abstraction) but not mixed. In that
case it knows how to handle the overloaded 'address' in skb_frag in a
clean manner.