On Wed 12-01-22 14:16:15, Jan Kara wrote: > On Tue 11-01-22 00:08:56, Tetsuo Handa wrote: > > On 2022/01/10 22:42, Jan Kara wrote: > > > a) We didn't fully establish a real deadlock scenario from the lockdep > > > report, did we? The lockdep report involved suspend locks, some locks on > > > accessing files in /proc etc. and it was not clear whether it all reflects > > > a real deadlock possibility or just a fact that lockdep tracking is rather > > > coarse-grained at times. Now lockdep report is unpleasant and loop device > > > locking was ugly anyway so your async change made sense but once lockdep is > > > silenced we should really establish whether there is real deadlock and more > > > work is needed or not. > > > > Not /proc files but /sys/power/resume file. > > Here is a reproducer but I can't test whether we can trigger a real deadlock. > > > > ---------------------------------------- > > #include <stdio.h> > > #include <sys/types.h> > > #include <sys/stat.h> > > #include <fcntl.h> > > #include <unistd.h> > > #include <sys/ioctl.h> > > #include <linux/loop.h> > > #include <sys/sendfile.h> > > > > int main(int argc, char *argv[]) > > { > > const int file_fd = open("testfile", O_RDWR | O_CREAT, 0600); > > ftruncate(file_fd, 1048576); > > char filename[128] = { }; > > const int loop_num = ioctl(open("/dev/loop-control", 3), LOOP_CTL_GET_FREE, 0); > > snprintf(filename, sizeof(filename) - 1, "/dev/loop%d", loop_num); > > const int loop_fd_1 = open(filename, O_RDWR); > > ioctl(loop_fd_1, LOOP_SET_FD, file_fd); > > const int loop_fd_2 = open(filename, O_RDWR); > > ioctl(loop_fd_1, LOOP_CLR_FD, 0); > > const int sysfs_fd = open("/sys/power/resume", O_RDWR); > > sendfile(file_fd, sysfs_fd, 0, 1048576); > > sendfile(loop_fd_2, file_fd, 0, 1048576); > > write(sysfs_fd, "700", 3); > > close(loop_fd_2); > > close(loop_fd_1); // Lockdep complains. > > close(file_fd); > > return 0; > > } > > ---------------------------------------- > > Thanks for the reproducer. I will try to simplify it even more and figure > out whether there is a real deadlock potential in the lockdep complaint or > not... OK, so I think I understand the lockdep complaint better. Lockdep essentially complains about the following scenario: blkdev_put() lock disk->open_mutex lo_release __loop_clr_fd() | | wait for IO to complete v loop worker write to backing file sb_start_write(file) | | wait for fs with backing file to unfreeze v process holding fs frozen freeze_super() | | wait for remaining writers on the fs so that fs can be frozen v sendfile() sb_start_write() do_splice_direct() | | blocked on read from /sys/power/resume, waiting for kernfs file | lock v write() "/dev/loop0" (in whatever form) to /sys/power/resume calls blkdev_get_by_dev("/dev/loop0") lock disk->open_mutex => deadlock So there are three other ingredients to this locking loop besides loop device behavior: 1) sysfs files which serialize reads & writes 2) sendfile which holds freeze protection while reading data to write 3) "resume" file behavior opening bdev from write to sysfs file We cannot sensibly do anything about 1) AFAICT. You cannot get a coherent view of a sysfs file while it is changing. We could actually change 2) (to only hold freeze protection while splicing from pipe) but that will fix the problem only for sendfile(2). E.g. splice(2) may also block waiting for data to become available in the pipe while holding freeze protection. Changing that would require some surgery in our splice infrastructure and at this point I'm not sure whether we would not introduce other locking problems due to pipe_lock lock ordering. For 3), we could consider stuff like not resuming from a loop device or postponing resume to a workqueue but it all looks ugly. Maybe the most disputable thing in this locking chain seems to be splicing from sysfs files. That does not seem terribly useful and due to special locking and behavior of sysfs files it allows for creating interesting lock dependencies. OTOH maybe there is someone out there who (possibly inadvertedly through some library) ends up using splice on sysfs files so chances for userspace breakage, if we disable splice for sysfs, would be non-negligible. Hum, tough. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR