> On Thu, 28 Oct 2021 08:45:03 -0300 Jason Gunthorpe wrote: >>> But will make all the callers of vlan_dev_real_dev() feel like they >>> should NULL-check the result, which is not necessary. >> >> Isn't it better to reliably return NULL instead of a silent UAF in >> this edge case? > > I don't know what the best practice is for maintaining sanity of > unregistered objects. > > If there really is a requirement for the real_dev pointer to be sane we > may want to move the put_device(real_dev) to vlan_dev_free(). There > should not be any risk of circular dependency but I'm not 100% sure. > >>> RDMA must be calling this helper on a vlan which was already >>> unregistered, can we fix RDMA instead? >> >> RDMA holds a get on the netdev which prevents unregistration, however >> unregister_vlan_dev() does: >> >> unregister_netdevice_queue(dev, head); >> dev_put(real_dev); >> >> Which corrupts the still registered vlan device while it is sitting in >> the queue waiting to unregister. So, it is not true that a registered >> vlan device always has working vlan_dev_real_dev(). > > That's not my reading, unless we have a different definition of > "registered". The RDMA code in question runs from a workqueue, at the > time the UNREGISTER notification is generated all objects are still > alive and no UAF can happen. Past UNREGISTER extra care is needed when > accessing the object. > > Note that unregister_vlan_dev() may queue the unregistration, without > running it. If it clears real_dev the UNREGISTER notification will no > longer be able to access real_dev, which used to be completely legal. > . > I am sorry. I have made a misunderstanding and given a wrong conclusion that unregister_vlan_dev() just move the vlan_ndev to a list to unregister later and it is possible the real_dev has been freed when we access in netdevice_queue_work(). real_ndev UNREGISTE trigger NETDEV_UNREGISTER notification in vlan_device_event(), unregister_vlan_dev() and unregister_netdevice_many() are within real_ndev UNREGISTE process. real_dev and vlan_ndev are all alive before real_ndev UNREGISTE finished. Above is the correction for my previous misunderstanding. But the real scenario of the problem is as following: __rtnl_newlink vlan_newlink register_vlan_dev(vlan_ndev, ...) register_netdevice(vlan_ndev) netdevice_queue_work(..., vlan_ndev) [dev_hold(vlan_ndev)] queue_work(gid_cache_wq, ...) ... rtnl_configure_link(vlan_ndev, ...) [failed] ops->dellink(vlan_ndev, &list_kill) [unregister_vlan_dev] unregister_netdevice_many(&list_kill) ... ppp_release unregister_netdevice(real_dev) ppp_destroy_interface free_netdev(real_dev) netdev_freemem(real_dev) [real_dev freed] ... netdevice_event_work_handler [vlan_ndev NETDEV_REGISTER notifier work] is_eth_port_of_netdev_filter vlan_dev_real_dev [real_dev UAF] So my first solution as following for the problem is correct. https://lore.kernel.org/linux-rdma/20211025163941.GA393143@xxxxxxxxxx/T/#m44abbf1ea5e4b5237610c1b389c3340d92a03b8d Thank you!