On Fri, May 15, 2020 at 2:56 PM Bartosz Golaszewski <brgl@xxxxxxxx> wrote: > pt., 15 maj 2020 o 14:04 Arnd Bergmann <arnd@xxxxxxxx> napisał(a): > > On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski <brgl@xxxxxxxx> wrote: > > > > > > > > It looks like most of the stuff inside of the loop can be pulled out > > > > and only done once here. > > > > > > > > > > I did that in one of the previous submissions but it was pointed out > > > to me that a parallel TX path may fill up the queue before I wake it. > > > > Right, I see you plugged that hole, however the way you hold the > > spinlock across the expensive DMA management but then give it > > up in each loop iteration feels like this is not the most efficient > > way. > > > > Maybe my thinking is wrong here, but I assumed that with a spinlock > it's better to give other threads the chance to run in between each > iteration. I didn't benchmark it though. It depends. You want to avoid lock contention (two threads trying to get the lock at the same time) but you also want to avoid bouncing around the spinlock between the caches. In the contention case, what I think would happen here is that the cleanup thread gives up the lock and the xmit function gets it, but then the cleanup thread is spinning again, so you are still blocked on one of the two CPUs but also pay the overhead of synchronizing between the two. Holding the lock the whole time would speed up both the good case (no contention) and the bad case (bouncing the lock) a little bit because it saves some overhead. Holding the lock for shorter times (i.e. not during the cache operations) would reduce the amount of lock-contention but not help in the good case. Not needing a lock at all is generally best, but getting it right is tricky ;-) Arnd