Re: Race condition between cm_migrate() and cm_remove_one()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 04, 2020 at 05:08:59PM -0500, Ryan Stone wrote:

> If anybody could give input on my analysis and the proposed solution
> I'd really appreciate it.

Yikes, this whole thing is just wrong..

1) We can't migrate QP's across devices. So av and alt_av must be in the
   same cm_dev, we never check this when forming the AV and alt AV's
   during LAP. Wee

2) cm_remove_one needs to remove all the cm_dev's because it is going
   to kfree them. Using altr_send_port_not_ready is foolish because
   what we really want is to NULL the port pointer (we are freeing
   that too)

3) Touching the AV after cm_remove_one(), eg for
   rdma_destroy_ah_attr() is wrong. The AV is part of the cm_dev and
   has to be cleaned up before the cm_remove_one can return.

4) The flush_workqueue() in cm_remove_one is wishful thinking, there
   are many places still using the mad_agent that are not on that
   workqueue.

   A proper 'av_lock' rwsem going to be needed here

   Which is another example of why every time I see some idiodic
   'is_closed' flag it is just a sign of wrong, wrong, wrong.

Fixing it requires a full audit of all the places using the AV :\

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux