On 2020-07-21 23:53, Rajkumar Manoharan wrote: > On 2020-07-21 10:14, Rakesh Pillai wrote: >> NAPI instance gets scheduled on a CPU core on which >> the IRQ was triggered. The processing of rx packets >> can be CPU intensive and since NAPI cannot be moved >> to a different CPU core, to get better performance, >> its better to move the gist of rx packet processing >> in a high priority thread. >> >> Add the init/deinit part for a thread to process the >> receive packets. >> > IMHO this defeat the whole purpose of NAPI. Originally in ath10k > irq processing happened in tasklet (high priority) context which in > turn push more data to net core even though net is unable to process > driver data as both happen in different context (fast producer - slow > consumer) > issue. Why can't CPU governor schedule the interrupts in less loaded CPU > core? > Otherwise you can play with different RPS and affinity settings to meet > your > requirement. > > IMO introducing high priority tasklets/threads is not viable solution. I'm beginning to think that the main problem with NAPI here is that the work done by poll functions on 802.11 drivers is significantly more CPU intensive compared to ethernet drivers, possibly more than what NAPI was designed for. I'm considering testing a different approach (with mt76 initially): - Add a mac80211 rx function that puts processed skbs into a list instead of handing them to the network stack directly. - Move all rx processing to a high priority thread, keep a driver internal queue for fully processed packets. - Schedule NAPI poll on completion. - NAPI poll function pulls from the internal queue and passes to the network stack. With this approach, the network stack retains some control over the processing rate of rx packets, while the scheduler can move the CPU intensive processing around to where it fits best. What do you think? - Felix