On Wed, Jun 23, 2021 at 09:14:34AM -0700, Luis Chamberlain wrote: > kernfs / sysfs is in not struct device specific, it has no semantics for > modules or even devices. The aspect of kernfs which deals with locking > a struct kernfs_node is kernfs_get_active(). The refcount there of > importance is the call to atomic_inc_unless_negative(&kn->active). > > struct kernfs_node *kernfs_get_active(struct kernfs_node *kn) > { > if (unlikely(!kn)) > return NULL; > > if (!atomic_inc_unless_negative(&kn->active)) > return NULL; > > if (kernfs_lockdep(kn)) > rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_); > return kn; > } > > BTW this was also precisely where I had suggested to extend the > kernfs_node with an optional kn->owner and if set we try_module_get() to > prevent the deadlock case if the module exit routine also happens to use > a lock also used by a sysfs attribute store/read. > > The flow would be (from a real live gdb crash backtrace from an original > bug report from a customer): > > write system call --> > ksys_write () > vfs_write() --> > __vfs_write() --> > kernfs_fop_write() (note now this is kernfs_fop_write_iter()) --> > sysfs_kf_write() --> > dev_attr_store() --> > null reference > > The dereference is because the dev_attr_store() call which we are > modifying > > LINE-001 static ssize_t dev_attr_store(struct kobject *kobj, struct attribute *attr, > LINE-002 const char *buf, size_t count) > LINE-003 { > LINE-004 struct device_attribute *dev_attr = to_dev_attr(attr); > LINE-005 struct device *dev = kobj_to_dev(kobj); > LINE-006 ssize_t ret = -EIO; > LINE-007 > LINE-008 if (dev_attr->store) > LINE-009 ret = dev_attr->store(dev, dev_attr, buf, count); > ... > } > > The race happens because a sysfs store / read can be triggered, the CPU > could be preempted after LINE-008, and the ->store is gone by LINE-009. > This begs the question if kernfs_fop_write_iter() or sysfs protects this > somehow? Let's see. > > For kernfs we have: > > static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) > { > struct kernfs_open_file *of = kernfs_of(iocb->ki_filp); > ... > mutex_lock(&of->mutex); > if (!kernfs_get_active(of->kn)) { > mutex_unlock(&of->mutex); > len = -ENODEV; > goto out_free; > } > > ops = kernfs_ops(of->kn); > if (ops->write) > len = ops->write(of, buf, len, iocb->ki_pos); > else > len = -EINVAL; > > kernfs_put_active(of->kn); > mutex_unlock(&of->mutex); > ... > } > > And the write call here is a syfs calls: > > static ssize_t sysfs_kf_write(struct kernfs_open_file *of, char *buf, > size_t count, loff_t pos) > { > const struct sysfs_ops *ops = sysfs_file_ops(of->kn); > struct kobject *kobj = of->kn->parent->priv; > > if (!count) > return 0; > > return ops->store(kobj, of->kn->priv, buf, count); > } > > As we have observed already, the active reference obtained through > kernfs_get_active() was for the struct kernfs_node. Sure, the > ops->write() is valid, in this case it sysfs_kf_write(). > > sysfs isn't doing any active reference check for the kobject device > attribute as it doesn't care for them. So syfs calls > dev_attr_store(), but the dev_attr_store() is not preventing the device > attribute ops to go fishing, and we destroy them while we're destroying > the device on module removal. Ah, but sysfs _should_ be doing this properly. I think the issue is that when we store the kobject pointer in kernfs, it is NOT incremented. Look at sysfs_create_dir_ns(), if we call kobject_get(kobj) right before we call kernfs_create_dir_ns(), and then properly clean things up later on when we remove the sysfs directory (i.e. the kobject), it _should_ fix this problem. Then, we know, whenever show/store/whatever is called, when we cast out of kernfs the private pointer to a kobject, that the kobject really is still alive, so we can use it properly. Can you try that, it should be a much "simpler" change here. thanks, greg k-h