On 2020-02-09 11:58, Magnus Karlsson wrote:
On Fri, Feb 7, 2020 at 10:41 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:On 2/7/20 1:34 PM, Maxim Mikityanskiy wrote:On 2020-02-07 11:37, Magnus Karlsson wrote:The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size") introduced a much more lazy way of updating the global consumer pointers from the kernel side, by only doing so when running out of entries in the fill or Tx rings (the rings consumed by the kernel). This can result in a deadlock with the user application if the kernel requires more than one entry to proceed and the application cannot put these entries in the fill ring because the kernel has not updated the global consumer pointer since the ring is not empty.[...]Acked-by: Maxim Mikityanskiy <maximmi@xxxxxxxxxxxx> Is it intentional that you didn't send it to bpf and netdev mailing lists?Yep, please resend with Maxim's ACK to bpf + netdev in Cc. Thanks!It was intentional, but maybe confusing. In the mail just before this patch I suggested that this patch should be part of a patch set that we send to bpf and netdev. The purpose of sending it was for you Maxim to ok it, and you did with your ack :-). But I will repeat the other questions from that mail here.
OK, I see. Sorry, I didn't see the previous email (and still can't find it, BTW).
Does the Mellanox driver set the need_wakeup flag on Rx when it needs more buffers in the fill ring to form its stride and the HW Rx ring is empty? If yes, great. If not, please submit a patch for this.
Yes, it does (regardless of emptiness of the HW RX ring). If xsk_umem_has_addrs_rq returns false, the driver sets the need_wakeup flag.
I will produce a documentation patch that clarifies that when the need_wakeup flag is set for the fill ring, the user need to put more packets on the fill ring and wakeup the kernel. It is already mentioned in the documentation, but I will make it more explicit.
Great, thanks!There's still room for optimization here, though: how many is "more"? If the application puts them one by one and wakes up the driver every time, it's not very effective comparing to the case if it knew the exact amount of missing frames. On the other hand, normally the application would put enough frames in advance, and it shouldn't get to this point.