On Wed, Jun 23, 2021 at 02:50:07PM -0700, Luis Chamberlain wrote: > It's possible today to have a device attribute read or store > race against device removal. This is known to happen as follows: > > write system call --> > ksys_write () --> > vfs_write() --> > __vfs_write() --> > kernfs_fop_write_iter() --> > sysfs_kf_write() --> > dev_attr_store() --> > null reference > > This happens because the dev_attr->store() callback can be > removed prior to its call, after dev_attr_store() was initiated. > The null dereference is possible because the sysfs ops can be > removed on module removal, for instance, when device_del() is > called, and a sysfs read / store is not doing any kobject reference > bumps either. This allows a read/store call to initiate, a > device_del() to kick off, and then the read/store call can be > gone by the time to execute it. > > The sysfs filesystem is not doing any kobject reference bumps during a > read / store ops to prevent this. > > To fix this in a simplified way, just bump the kobject reference when > we create a directory and remove it on directory removal. > > The big unfortunate eye-sore is addressing the manual kobject reference > assumption on the networking code, which leads me to believe we should > end up replacing that eventually with another sort of check. > > Suggested-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> > --- > > This v4 moves to fixing the race condition on dev_attr_store() and > dev_attr_read() to sysfs by bumping the kobject reference count > on directory creation / deletion as suggested by Greg. > > Unfortunately at least the networking core has a manual refcount > assumption, which needs to be adjusted to account for this change. > This should also mean there is runtime for other kobjects which may > not be explored yet which may need fixing as well. We may want to > change the check to something else on the networking front, but its > not clear to me yet what to use. > > fs/sysfs/dir.c | 3 +++ > net/core/dev.c | 4 ++-- > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c > index 59dffd5ca517..6c47aa4af6f5 100644 > --- a/fs/sysfs/dir.c > +++ b/fs/sysfs/dir.c > @@ -56,12 +56,14 @@ int sysfs_create_dir_ns(struct kobject *kobj, const void *ns) > > kobject_get_ownership(kobj, &uid, &gid); > > + kobject_get(kobj); > kn = kernfs_create_dir_ns(parent, kobject_name(kobj), > S_IRWXU | S_IRUGO | S_IXUGO, uid, gid, > kobj, ns); > if (IS_ERR(kn)) { > if (PTR_ERR(kn) == -EEXIST) > sysfs_warn_dup(parent, kobject_name(kobj)); > + kobject_put(kobj); > return PTR_ERR(kn); > } > > @@ -100,6 +102,7 @@ void sysfs_remove_dir(struct kobject *kobj) > if (kn) { > WARN_ON_ONCE(kernfs_type(kn) != KERNFS_DIR); > kernfs_remove(kn); > + kobject_put(kobj); > } > } Shouldn't this be taken on open() not sysfs creation, or is the problem that the kobject is held the module memory rather than duplicated by sysfs? > diff --git a/net/core/dev.c b/net/core/dev.c > index 222b1d322c96..3a0ffa603d14 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -10429,7 +10429,7 @@ static void netdev_wait_allrefs(struct net_device *dev) > rebroadcast_time = warning_time = jiffies; > refcnt = netdev_refcnt_read(dev); > > - while (refcnt != 1) { > + while (refcnt != 3) { > if (time_after(jiffies, rebroadcast_time + 1 * HZ)) { > rtnl_lock(); > > @@ -10544,7 +10544,7 @@ void netdev_run_todo(void) > netdev_wait_allrefs(dev); > > /* paranoia */ > - BUG_ON(netdev_refcnt_read(dev) != 1); > + BUG_ON(netdev_refcnt_read(dev) != 3); And surely there are things besides netdevs that would suffer from this change? > BUG_ON(!list_empty(&dev->ptype_all)); > BUG_ON(!list_empty(&dev->ptype_specific)); > WARN_ON(rcu_access_pointer(dev->ip_ptr)); > -- > 2.27.0 > -- Kees Cook