Re: [PATCH] ipoib: clean ib tx ring periodically

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 1, 2017 at 11:07 AM, Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
> On Wed, 2017-03-01 at 09:28 +0200, Erez Shitrit wrote:
>> On Thu, Feb 16, 2017 at 5:35 PM, Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
>> > The skbs transmitted via ipoib_send() are freed only if there are
>> > 16 or more outstanding work requests or if the send queue is full.
>> >
>> > If there is very little networking activity, the transmitted skbs
>> > can be held by the device driver for an unlimited amount of time,
>> > starving other subsystems.
>> >
>> > E.g. assuming the ipv6 is enabled, with the following sequence:
>> >
>> > systemctl start firewalld
>> > modprobe ib_ipoib
>> > ip addr add dev ib0 fc00::1/64
>> > systemctl stop firewalld
>> >
>> > a cpu will hang: rmmod conntrack will keep a core busy
>> > spinning for nf_conntrack_untracked going to 0, since some ICMP6
>> > ND packets are generated and transmitted when the ipv6 address
>> > is attached to the device, and such packets get a notrack ct
>> > entry.
>> >
>> > This change address the issue introducing a periodic timer performing
>> > "garbage collection" on the send ring at low frequency (once every
>> > second).
>> >
>> > This new timer runs independently from the currently used poll_timer,
>> > so that no additional delay is introduced to clean the ring after
>> > errors or ring full event.
>>
>> Hi,
>>
>> Adding a new timer is not the required solution, it is a w/a over the
>> TX part in the ipoib driver.
>> The real solution, IMHO, is to use the napi mechanism for the TX in a
>> similar way as it done in the RX. (as it done in many network drivers)
>>
>> We (Mellanox) are planning to send such solution in the next few days.
>
> Thank you for jumping-in on this.
>
> I think that the tx napi polling implementation for the ipoib driver is
> not so straight-forward because, afaics, the ib completion callback is
> intentionally avoided for tx - unless in exceptional scenarios -
> possibly for performance reason.

Not sure that was the reason, probably other historical reasons.

You can see that in the CM mode the IPoIB driver uses napi for the tx
completion.

(it works for us in the UD mode, and without napi we will not be able
to implement time sensitive features on top of ipoib (time stamping
for example)

>
> Anyway, if you can fix this in a cleaner way, I'll be more than happy.
>
> Thank you,
>
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux