On Wed, 3 May 2023 17:49:34 +0200 Jesper Dangaard Brouer wrote: > On 03/05/2023 13.18, Toke Høiland-Jørgensen wrote: > > Jakub Kicinski <kuba@xxxxxxxxxx> writes: > >> We can remove the warning without removing the entire delayed freeing > >> scheme. I definitely like the SHUTDOWN flag and patch 2 but I'm a bit > >> less clear on why the complexity of datapath freeing is justified. > >> Can you explain? > > > > You mean just let the workqueue keep rescheduling itself every minute > > for the (potentially) hours that skbs will stick around? Seems a bit > > wasteful, doesn't it? :) > > I agree that this workqueue that keeps rescheduling is wasteful. > It actually reschedules every second, even more wasteful. > NIC drivers will have many HW RX-queues, with separate PP instances, > that each can start a workqueue that resched every sec. There's a lot of work items flying around on a working system. I don't think the rare (and very cheap) PP check work is going to move the needle. You should see how many work items DIM schedules :( I'd think that potentially having the extra memory pinned is much more of an issue than a periodic check, and that does not go away by changing the checking mechanism. > Eric have convinced me that SKBs can "stick around" for longer than the > assumptions in PP. The old PP assumptions came from XDP-return path. > It is time to cleanup. I see. Regardless - should we have some count/statistic for the number of "zombie" page pools sitting around in the system? Could be useful for debug. > > We did see an issue where creating and tearing down lots of page pools > > in a short period of time caused significant slowdowns due to the > > workqueue mechanism. Lots being "thousands per second". This is possible > > using the live packet mode of bpf_prog_run() for XDP, which will setup > > and destroy a page pool for each syscall... > > Yes, the XDP live packet mode of bpf_prog_run is IMHO abusing the > page_pool API. We should fix that somehow, at least the case where live > packet mode is only injecting a single packet, but still creates a PP > instance. The PP in live packet mode IMHO only makes sense when > repeatedly sending packets that gets recycles and are pre-inited by PP. +1, FWIW, I was surprised that we have a init_callback which sits in the fastpath exists purely for testing. > This use of PP does exemplify why is it problematic to keep the workqueue. > > I have considered (and could be convinced) delaying the free via > call_rcu, but it also create an unfortunate backlog of work in the case > of live packet mode of bpf_prog_run. Maybe let the pp used by BPF testing be reusable? No real driver will create thousands of PPs a seconds, that's not sane. Anyway, you'll choose what you'll choose. I just wanted to cast my vote for the work queue rather than the tricky lockless release code.