On Fri, Apr 5, 2024 at 1:01 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > > This specifically cannot happen because sysfs is not allowed as an > > upper layer only as a lower layer, so overlayfs itself will not be writing to > > /sys/power/resume. > > Then how could you possibly get a deadlock there? What would your minimal > deadlocked set look like? > > 1. Something is blocked in lookup_bdev() called from resume_store(), called > from sysfs_kf_write(), called from kernfs_write_iter(), which has acquired > ->mutex of struct kernfs_open_file that had been allocated by > kernfs_fop_open() back when the file had been opened. Note that each > struct file instance gets a separate struct kernfs_open_file. Since we are > calling ->write_iter(), the file *MUST* have been opened for write. > > 2. Something is blocked in kernfs_fop_llseek() on the same of->mutex, > i.e. using the same struct file as (1). That something is holding an > overlayfs inode lock, which is what the next thread is blocked on. > > + at least one more thread, to complete the cycle. > > Right? How could that possibly happen without overlayfs opening /sys/power/resume > for write? Again, each struct file instance gets a separate of->mutex; > for a deadlock you need a cycle of threads and a cycle of locks, such > that each thread is holding the corresponding lock and is blocked on > attempt to get the lock that comes next in the cyclic order. Absolutely right. I had it in my mind that this was a node lock. Did not look closely. > > If overlayfs never writes to that sucker, it can't participate in that > cycle. Sure, you can get overlayfs llseek grabbing of->mutex of *ANOTHER* > struct file opened for the same sysfs file. Since it's not the same > struct file and since each struct file there gets a separate kernfs_open_file > instance, the mutex won't be the same. > > Unless I'm missing something else, that can't deadlock. For a quick and > dirty experiment, try to give of->mutex on r/o opens a class separate from > that on r/w and w/o opens (mutex_init() in kernfs_fop_open()) and see > if lockdep warnings persist. > > Something like > > if (has_mmap) > mutex_init(&of->mutex); > else if (file->f_mode & FMODE_WRITE) > mutex_init(&of->mutex); > else > mutex_init(&of->mutex); Why a quick experiment? Why not a permanent kludge? It is not any better or worse than the already existing has_mmap subclass annotation. huh? Thanks, Amir.