On 2021/4/20 7:55, Michal Kubecek wrote: > On Mon, Apr 19, 2021 at 05:29:46PM +0200, Michal Kubecek wrote: >> >> As pointed out in the discussion on v3, this patch may result in >> significantly higher CPU consumption with multiple threads competing on >> a saturated outgoing device. I missed this submission so that I haven't >> checked it yet but given the description of v3->v4 changes above, it's >> quite likely that it suffers from the same problem. > > And it indeed does. However, with the additional patch from the v3 > discussion, the numbers are approximately the same as with an unpatched > mainline kernel. > > As with v4, I tried this patch on top of 5.12-rc7 with real devices. > I used two machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs > (2 8-core CPUs with HT disabled) and 16 Rx/Tx queues, receiver has > 48 CPUs (2 12-core CPUs with HT enabled) and 48 Rx/Tx queues. > > threads 5.12-rc7 5.12-rc7 + v4 5.12-rc7 + v4 + stop > 1 25.1% 38.1% 22.9% > 8 66.2% 277.0% 74.1% > 16 90.1% 150.7% 91.0% > 32 107.2% 272.6% 108.3% > 64 116.3% 487.5% 118.1% > 128 126.1% 946.7% 126.9% > > (The values are normalized to one core, i.e. 100% corresponds to one > fully used logical CPU.) > > So it seems that repeated scheduling while the queue was stopped is > indeed the main performance issue and that other cases of the logic > being too pessimistic do not play significant role. There is an > exception with 8 connections/threads and the result with just this > series also looks abnormally high (e.g. much higher than with > 16 threads). It might be worth investigating what happens there and > what do the results with other thread counts around 8 look like. Will try to investigate the 8 connections/threads case. > > I'll run some more tests with other traffic patterns tomorrow and > I'm also going to take a closer look at the additional patch. Thanks for taking the detail testing and looking. > > Michal > > . >