On Sat, 3 Jul 2010 13:41:23 +1000 Nick Piggin <npiggin@xxxxxxx> wrote: > On Fri, Jul 02, 2010 at 07:03:55PM -0700, Andrew Morton wrote: > > On Wed, 30 Jun 2010 22:05:02 +1000 Nick Piggin <npiggin@xxxxxxx> wrote: > > > > > On Wed, Jun 30, 2010 at 05:27:02PM +1000, Dave Chinner wrote: > > > > On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin@xxxxxxx wrote: > > > > > Protect inode->i_count with i_lock, rather than having it atomic. > > > > > Next step should also be to move things together (eg. the refcount increment > > > > > into d_instantiate, which will remove a lock/unlock cycle on i_lock). > > > > ..... > > > > > Index: linux-2.6/fs/inode.c > > > > > =================================================================== > > > > > --- linux-2.6.orig/fs/inode.c > > > > > +++ linux-2.6/fs/inode.c > > > > > @@ -33,14 +33,13 @@ > > > > > * inode_hash_lock protects: > > > > > * inode hash table, i_hash > > > > > * inode->i_lock protects: > > > > > - * i_state > > > > > + * i_state, i_count > > > > > * > > > > > * Ordering: > > > > > * inode_lock > > > > > * sb_inode_list_lock > > > > > * inode->i_lock > > > > > - * inode_lock > > > > > - * inode_hash_lock > > > > > + * inode_hash_lock > > > > > */ > > > > > > > > I thought that the rule governing the use of inode->i_lock was that > > > > it can be used anywhere as long as it is the innermost lock. > > > > > > > > Hmmm, no references in the code or documentation. Google gives a > > > > pretty good reference: > > > > > > > > http://www.mail-archive.com/linux-ext4@xxxxxxxxxxxxxxx/msg02584.html > > > > > > > > Perhaps a different/new lock needs to be used here? > > > > > > Well I just changed the order (and documented it to boot :)). It's > > > pretty easy to verify that LOR is no problem. inode hash is only > > > taken in a very few places so other code outside inode.c is fine to > > > use i_lock as an innermost lock. > > > > um, nesting a kernel-wide singleton lock inside a per-inode lock is > > plain nutty. > > I think it worked out OK. Because the kernel-wide locks are really > restricted in where they are to be used (ie. not in filesystems). So > they're really pretty constrained to the inode management subsystem. > So filesystems still get to really use i_lock as an inner most lock > for their purposes. > > And filesystems get to take i_lock and stop all manipulation of inode > now. No changing of flags, moving it on/off lists etc behind its back. > It really is about locking the data rather than the code. > > The final ordering outcome looks like this: > > * inode->i_lock > * inode_list_lglock > * zone->inode_lru_lock > * wb->b_lock > * inode_hash_bucket lock Apart from the conceptual vandalism, it means that any contention times and cache transfer times on those singleton locks will increase worst-case hold times of our nice, fine-grained i_lock. > And it works like that because when you want to add or remove an inode > from various data structures, you take the i_lock Well that would be wrong. i_lock protects things *within* its inode. It's nonsensical to take i_lock when the inode is being added to or removed from external containers because i_lock doesn't protect those containers! > and then take each > of these locks in turn, inside it. The alternative is to build a bigger > lock ordering graph, and take all the locks up-front before taking > i_lock. I did actaully try that and it ended up being worse, so I went > this route. > > I think taking a global lock in mark_inode_dirty is nutty (especially > when that global lock is shared with hash management, LRU scanning, > writeback, i_flags access... :) It's just a question of which is less > nutty. Yes, inode_lock is bad news. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html