On Fri, Jun 12, 2020 at 12:41:16PM -0700, Gerd Rausch wrote: > This issue appears to only exist in Linux versions > 2.6.26 through 4.14 inclusively: > > With the introduction of commit > f56bcd8013566 ("IPoIB: Use separate CQ for UD send completions") > > work completions are only processed once there are > more than 17 outstanding TX work requests. > > Unfortunately, that also delays the processing of the > completion handler and holds on to references > held by the "skb" since "dev_kfree_skb_any" > won't be called for a very long time. > > E.g. we've observed "nf_conntrack_cleanup_net_list" spin > around for hours until "net->ct.count" goes down to zero > on a sufficiently idle interface. > > This fix arms the TX CQ after those "poll_tx" loops, > in order for "ipoib_send_comp_handler" to do its thing: > > While it's obvious that processing completions one-by-one > is more costly than doing so in bulk, > holding on to "skb" resources for a potentially unlimited > amount of time appears to be a less favorable trade-off. > > This issue appears to no longer exist in Linux-4.15 > and younger, because the following commit does > call "ib_req_notify_cq" on "send_cq": > 8966e28d2e40c ("IB/ipoib: Use NAPI in UD/TX flows") I'm not really clear what you want to happen to this patch - are you proposing a stable patch that is not just a backport? Why can't you backport the fix above instead? You'll need to follow everything in Documentation/process/stable-kernel-rules.rst Or the stable maintainers won't even look at this. Jasom