Re: [PATCH for-rc v2] IB/mlx4: Fix leak in id_map_find_del

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 23, 2020 at 04:55:21PM +0100, Håkon Bugge wrote:
> Using CX-3 virtual functions, either from a bare-metal machine or
> pass-through from a VM, MAD packets are proxied through the PF driver.
>
> Since the VF drivers have separate name spaces for MAD Transaction Ids
> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
> in a cache.
>
> Following the RDMA Connection Manager (CM) protocol, it is clear when
> an entry has to evicted from the cache. When a DREP is sent from
> mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar
> when a REJ is received by the mlx4_ib_demux_cm_handler(),
> id_map_find_del() is called.
>
> This function wipes out the TID in use from the IDR or XArray and
> removes the id_map_entry from the table.
>
> In short, it does everything except the topping of the cake, which is
> to remove the entry from the list and free it. In other words, for the
> REJ case enumerated above, one id_map_entry will be leaked.
>
> For the other case above, a DREQ has been received first. The
> reception of the DREQ will trigger queuing of a delayed work to delete
> the id_map_entry, for the case where the VM doesn't send back a DREP.
>
> In the normal case, the VM _will_ send back a DREP, and
> id_map_find_del() will be called.
>
> But this scenario introduces a secondary leak. First, when the DREQ is
> received, a delayed work is queued. The VM will then return a DREP,
> which will call id_map_find_del(). As stated above, this will free the
> TID used from the XArray or IDR. Now, there is window where that
> particular TID can be re-allocated, lets say by an outgoing REQ. This
> TID will later be wiped out by the delayed work, when the function
> id_map_ent_timeout() is called. But the id_map_entry allocated by the
> outgoing REQ will not be de-allocated, and we have a leak.
>
> Both leaks are fixed by removing the id_map_find_del() function and
> only using schedule_delayed(). Of course, a check in
> schedule_delayed() to see if the work already has been queued, has
> been added.
>
> Another benefit of always using the delayed version for deleting
> entries, is that we do get a TimeWait effect; a TID no longer in use,
> will occupy the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time,
> without any ability of being re-used for that time period.
>
> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
>

Thanks a lot,
I added it to our regression system for over weekend run.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux