On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote: > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote: > > > > In the lockdep dependency chain, overlayfs inode lock is taken > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower > > layer of overlayfs, which is sane. > > > > With /sys/power/resume (and probably other files), sysfs also > > behaves as a stacking filesystem, calling vfs helpers, such as > > lookup_bdev() -> kern_path(), which is a behavior of a stacked > > filesystem, without all the precautions that comes with behaving > > as a stacked filesystem. > > No. This is far worse than anything stacked filesystems do - it's > an arbitrary pathname resolution while holding a lock. > It's not local. Just about anything (including automounts, etc.) > can be happening there and it pushes the lock in question outside > of *ALL* pathwalk-related locks. Pathname doesn't have to > resolve to anything on overlayfs - it can just go through > a symlink on it, or walk into it and traverse a bunch of .. > afterwards, etc. > > Don't confuse that with stacking - it's not even close. > You can't use that anywhere near overlayfs layers. > > Maybe isolate it into a separate filesystem, to be automounted > on /sys/power. And make anyone playing with overlayfs with > sysfs as a layer mount the damn thing on top of power/ in your > overlayfs. But using that thing as a part of layer is > a non-starter. Incidentally, why do you need to lock overlayfs inode to call vfs_llseek() on the underlying file? It might (or might not) need to lock the underlying file (for things like ->i_size, etc.), but that will be done by ->llseek() instance and it would deal with the inode in the layer, not overlayfs one. Similar question applies to ovl_write_iter() - why do you need to hold the overlayfs inode locked during the call of backing_file_write_iter()?