On 2024/11/1 0:18, Toke Høiland-Jørgensen wrote: ... >>> >>> Eliding the details above, but yeah, you're right, there are probably >>> some pernicious details to get right if we want to flush all caches. S >>> I wouldn't do that to start with. Instead, just add the waiting to start >>> with, then wait and see if this actually turns out to be a problem in >>> practice. And if it is, identify the source of that problem, deal with >>> it, rinse and repeat :) >> >> I am not sure if I have mentioned to you that jakub had a RFC for the waiting, >> see [1]. And Yonglong Cc'ed had tested it, the waiting caused the driver unload >> stalling forever and some task hung, see [2]. >> >> The root cause for the above case is skb_defer_free_flush() not being called >> as mentioned before. > > Well, let's fix that, then! We already logic to flush backlogs when a > netdevice is going away, so AFAICT all that's needed is to add the Is there a possiblity that the page_pool owned page might be still handled/cached in somewhere of networking if netif_rx_internal() is already called for the corresponding skb and skb_attempt_defer_free() is called after skb_defer_free_flush() added in below patch is called? Maybe add a timeout thing like timer to call kick_defer_list_purge() if you treat 'outstanding forever' as leaked? I actually thought about this, but had not found out an elegant way to add the timeout. > skb_defer_free_flush() to that logic. Totally untested patch below, that > we should maybe consider applying in any case. I am not sure about that as the above mentioned timing window, but it does seem we might need to do something similar in dev_cpu_dead(). > >> I am not sure if I understand the reasoning behind the above suggestion to 'wait >> and see if this actually turns out to be a problem' when we already know that there >> are some cases which need cache kicking/flushing for the waiting to work and those >> kicking/flushing may not be easy and may take indefinite time too, not to mention >> there might be other cases that need kicking/flushing that we don't know yet. >> >> Is there any reason not to consider recording the inflight pages so that unmapping >> can be done for inflight pages before driver unbound supposing dynamic number of >> inflight pages can be supported? >> >> IOW, Is there any reason you and jesper taking it as axiomatic that recording the >> inflight pages is bad supposing the inflight pages can be unlimited and recording >> can be done with least performance overhead? > > Well, page pool is a memory allocator, and it already has a mechanism to > handle returning of memory to it. You're proposing to add a second, > orthogonal, mechanism to do this, one that adds both overhead and I would call it as a replacement/improvement for the old one instead of 'a second, orthogonal' as the old one doesn't really exist after this patch. > complexity, yet doesn't handle all cases (cf your comment about devmem). I am not sure if unmapping only need to be done using its own version DMA API for devmem yet, but it seems waiting might also need to use its own version of kicking/flushing for devmem as devmem might be held from the user space? > > And even if it did handle all cases, force-releasing pages in this way > really feels like it's just papering over the issue. If there are pages > being leaked (or that are outstanding forever, which basically amounts > to the same thing), that is something we should be fixing the root cause > of, not just working around it like this series does. If there is a definite time for waiting, I am probably agreed with the above.