On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote: > > In the lockdep dependency chain, overlayfs inode lock is taken > before kernfs internal of->mutex, where kernfs (sysfs) is the lower > layer of overlayfs, which is sane. > > With /sys/power/resume (and probably other files), sysfs also > behaves as a stacking filesystem, calling vfs helpers, such as > lookup_bdev() -> kern_path(), which is a behavior of a stacked > filesystem, without all the precautions that comes with behaving > as a stacked filesystem. No. This is far worse than anything stacked filesystems do - it's an arbitrary pathname resolution while holding a lock. It's not local. Just about anything (including automounts, etc.) can be happening there and it pushes the lock in question outside of *ALL* pathwalk-related locks. Pathname doesn't have to resolve to anything on overlayfs - it can just go through a symlink on it, or walk into it and traverse a bunch of .. afterwards, etc. Don't confuse that with stacking - it's not even close. You can't use that anywhere near overlayfs layers. Maybe isolate it into a separate filesystem, to be automounted on /sys/power. And make anyone playing with overlayfs with sysfs as a layer mount the damn thing on top of power/ in your overlayfs. But using that thing as a part of layer is a non-starter.