On Tue, May 03, 2022 at 11:24:01AM -0300, Thadeu Lima de Souza Cascardo wrote: > On Tue, May 03, 2022 at 03:49:15PM +0200, Greg KH wrote: > > On Mon, May 02, 2022 at 05:49:24PM -0300, Thadeu Lima de Souza Cascardo wrote: > > > When dropping the rtnl_lock for looking up for a module, the device may be > > > removed, releasing the qdisc and class memory. Right after trying to load > > > the module, cl_ops->put is called, leading to a potential use-after-free. > > > > > > Though commit e368fdb61d8e ("net: sched: use Qdisc rcu API instead of > > > relying on rtnl lock") fixes this, it involves a lot of refactoring of the > > > net/sched/ code, complicating its backport. > > > > What about 4.14.y? We can not take a commit for 4.9.y with it also > > being broken in 4.14.y, and yet fixed in 4.19.y, right? Anyone who > > updates from 4.9 to 4.14 will have a regression. > > > > thanks, > > > > greg k-h > > 4.14.y does not call cl_ops->put (the get/put and class refcount has been done > with on 4.14.y). However, on the error path after the lock has been dropped, > tcf_chain_put is called. But it does not touch the qdisc, but only the chain > and block objects, which cannot be released on a race condition, as far as I > was able to investigate. So what changed between 4.9 and 4.14 that requires this out-of-tree change to 4.9 for the issue? Shouldn't we backport that change instead of this custom one? thanks, greg k-h