On Wed, 2010-12-01 at 15:31 -0800, Linus Torvalds wrote: > On Wed, Dec 1, 2010 at 2:38 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > OK, the stop_machine() plugs a lot of potential race-vs-module-unload > > things. But Trond is referring to races against vmscan inode reclaim, > > unmount, etc. > > So? > > A filesystem module cannot be unloaded while it's still mounted. > > And unmount doesn't succeed until all inodes are gone. > > And getting rid of an inode doesn't succeed until all pages associated > with it are gone. > > And getting rid of the pages involves locking them (whether in > truncate or vmscan) and removing them from all lists. > > Ergo: vmscan has a locked page leads to the filesystem being > guaranteed to not be unmounted. And that, in turn, guarantees that > the module won't be unloaded until the machine has gone through an > idle cycle. > > It really is that simple. There's nothing subtle there. The reason > spin_unlock(&mapping->tree_lock) is safe is exactly the above trivial > chain of dependencies. And it's also exactly why > mapping->a_ops->freepage() would also be safe. > > This is pretty much how all the module races are handled. Doing module > ref-counts per page (or per packet in flight for things like > networking) would be prohibitively expensive. There's no way we can > ever do that. Although the page is locked, it may no longer be visible to the lockless page lookup once the radix_tree_delete() completes in __remove_from_page_cache. Furthermore, if the same routine causes mapping->nr_pages to go to zero before iput_final() hits truncate_inode_pages(), then the latter exits immediately. Both these cases would appear to allow iput_final() to release the inode before vmscan gets round to unlocking the mapping->tree_lock since truncate_inode_pages() no longer thinks it has any work to do. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html