Re: [PATCH v3] drivers/base/core: refcount kobject and bus on device attribute read / store

Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> · Wed, 23 Jun 2021 18:51:42 +0200

On Wed, Jun 23, 2021 at 09:14:34AM -0700, Luis Chamberlain wrote:
> kernfs / sysfs is in not struct device specific, it has no semantics for
> modules or even devices. The aspect of kernfs which deals with locking
> a struct kernfs_node is kernfs_get_active(). The refcount there of
> importance is the call to atomic_inc_unless_negative(&kn->active).
> 
> struct kernfs_node *kernfs_get_active(struct kernfs_node *kn)                   
> {                                                                               
> 	if (unlikely(!kn))
> 		return NULL;                                                    
> 
> 	if (!atomic_inc_unless_negative(&kn->active))                           
> 		return NULL;                                                    
> 
> 	if (kernfs_lockdep(kn))
> 		rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_);               
> 	return kn;                                                              
> }
> 
> BTW this was also precisely where I had suggested to extend the
> kernfs_node with an optional kn->owner and if set we try_module_get() to
> prevent the deadlock case if the module exit routine also happens to use
> a lock also used by a sysfs attribute store/read.
> 
> The flow would be (from a real live gdb crash backtrace from an original
> bug report from a customer):
> 
> write system call -->
>   ksys_write ()
>     vfs_write() -->
>       __vfs_write() -->
>        kernfs_fop_write() (note now this is kernfs_fop_write_iter()) -->
>          sysfs_kf_write() -->
>            dev_attr_store() -->
> 	     null reference
> 
> The dereference is because the dev_attr_store() call which we are
> modifying
> 
> LINE-001 static ssize_t dev_attr_store(struct kobject *kobj, struct attribute *attr,                                                                                         
> LINE-002 const char *buf, size_t count)                                                                                                                 
> LINE-003 {
> LINE-004	struct device_attribute *dev_attr = to_dev_attr(attr);
> LINE-005	struct device *dev = kobj_to_dev(kobj);
> LINE-006	ssize_t ret = -EIO;
> LINE-007
> LINE-008	if (dev_attr->store)
> LINE-009		ret = dev_attr->store(dev, dev_attr, buf, count);
> 	...
> 	}
> 
> The race happens because a sysfs store / read can be triggered, the CPU
> could be preempted after LINE-008, and the ->store is gone by LINE-009.
> This begs the question if kernfs_fop_write_iter() or sysfs protects this
> somehow? Let's see.
> 
> For kernfs we have:
> 
> static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) 
> {
> 	struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
> 	...
> 	mutex_lock(&of->mutex);
> 	if (!kernfs_get_active(of->kn)) {
> 		mutex_unlock(&of->mutex);
> 		len = -ENODEV;
> 		goto out_free;
> 	}
> 
> 	ops = kernfs_ops(of->kn);
> 	if (ops->write)
> 		len = ops->write(of, buf, len, iocb->ki_pos);
> 	else
> 		len = -EINVAL;
> 
> 	kernfs_put_active(of->kn);
> 	mutex_unlock(&of->mutex);
> 	...
> }
> 
> And the write call here is a syfs calls:
> 
> static ssize_t sysfs_kf_write(struct kernfs_open_file *of, char *buf,           
> 			      size_t count, loff_t pos)                         
> {                                                                               
> 	const struct sysfs_ops *ops = sysfs_file_ops(of->kn);                   
> 	struct kobject *kobj = of->kn->parent->priv;                            
> 
> 	if (!count)                                                             
> 		return 0;                                                       
> 
> 	return ops->store(kobj, of->kn->priv, buf, count);                      
> }           
> 
> As we have observed already, the active reference obtained through
> kernfs_get_active() was for the struct kernfs_node. Sure, the
> ops->write() is valid, in this case it sysfs_kf_write().
> 
> sysfs isn't doing any active reference check for the kobject device
> attribute as it doesn't care for them. So syfs calls
> dev_attr_store(), but the dev_attr_store() is not preventing the device
> attribute ops to go fishing, and we destroy them while we're destroying
> the device on module removal.

Ah, but sysfs _should_ be doing this properly.

I think the issue is that when we store the kobject pointer in kernfs,
it is NOT incremented.  Look at sysfs_create_dir_ns(), if we call
kobject_get(kobj) right before we call kernfs_create_dir_ns(), and then
properly clean things up later on when we remove the sysfs directory
(i.e. the kobject), it _should_ fix this problem.

Then, we know, whenever show/store/whatever is called, when we cast out
of kernfs the private pointer to a kobject, that the kobject really is
still alive, so we can use it properly.

Can you try that, it should be a much "simpler" change here.

thanks,

greg k-h