On Thu, 18 Nov 2010 19:18:22 +1100 Nick Piggin <npiggin@xxxxxxxxx> wrote: > On Wed, Nov 17, 2010 at 10:28:34PM -0800, Andrew Morton wrote: > > > Logically I'd expect i_mutex to nest inside s_umount. Because s_umount > > is a per-superblock thing, and i_mutex is a per-file thing, and files > > live under superblocks. Nesting s_umount outside i_mutex creates > > complex deadlock graphs between the various i_mutexes, I think. > > You mean i_mutex outside s_umount? > Nope. i_mutex should nest inside s_umount. Just as inodes nest inside superblocks! Seems logical to me ;) > > Someone tell me if btrfs has the same bug, via its call to > > writeback_inodes_sb_nr_if_idle()? > > > > I don't see why these functions need s_umount at all, if they're called > > from within ->write_begin against an inode on that superblock. If the > > superblock can get itself disappeared while we're running ->write_begin > > on it, we have problems, no? > > > > In which case I'd suggest just removing the down_read(s_umount) and > > specifying that the caller must pin the superblock via some means. > > > > Only we can't do that because we need to hold s_umount until the > > bdi_queue_work() worker has done its work. > > > > The fact that a call to ->write_begin can randomly return with s_umount > > held, to be randomly released at some random time in the future is a > > bit ugly, isn't it? write_begin is a pretty low-level, per-inode > > thing. > > Yeah that whole writeback_inodes_if_idle is nasty > > > > It'd be better if we pinned these superblocks via refcounting, not via > > holding s_umount but even then, having ->write_begin randomly bump sb > > refcounts for random periods of time is still pretty ugly. > > > > What a pickle. > > > > Can we just delete writeback_inodes_sb_nr_if_idle() and > > writeback_inodes_sb_if_idle()? The changelog for 17bd55d037a02 is > > pretty handwavy - do we know that deleting these things would make a > > jot of difference? > > > > And why _do_ we need to hold s_umount during the bdi_queue_work() > > handover? Would simply bumping s_count suffice? > > s_count just prevents it from going away, but s_umount is still needed > to keep umount, remount,ro, freezing etc activity away. I don't think > there is an easy way to do it. > > Perhaps filesystem should have access to the dirty throttling path, kick > writeback or delayed allocation etc as needed, and throttle against > outstanding work that needs to be done, going through the normal > writeback paths? I just cannot believe that we need s_mount inside ->write_begin. Is it really the case that someone can come along and unmount or remount or freeze our filesystem while some other process is down performing a ->write_begin against one of its files? Kidding? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html