On Sat, Oct 16, 2010 at 12:29:38AM +1100, Nick Piggin wrote: > On Sat, Oct 16, 2010 at 12:03:00AM +1100, Nick Piggin wrote: > > On Fri, Oct 15, 2010 at 09:59:43PM +1100, Dave Chinner wrote: > > > My series uses i_lock only to protect i_state and i_ref. It does not > > > need to protect any more of the inode than that as other locks > > > protect the other list fields. As a result, it's still the inermost > > > lock and there are no trylocks in the code at all. > > We discussed it and I didn't think latencies would be any worse > > a problem than they are today. I agreed it may be an issue and > > pointed out there are ways forward to fix it. > > BTW. if a few trylocks are your biggest issue, this is a joke. I told > you how they can be fixed with incremental patches on top of the series > (which basically whittle down the lock coverage of the old inode_lock, > and so IMO need to be done in small chunks well bisectable and with > good rationale). So why you didn't submit a couple of incremental > patches to do just that is beyond me. > > I've had prototypes in my tree actually to do that from a while back, > but actually I'm thinking that using RCU may be a better way to go > now that Linus has agreed on it and we have sketched a design to do > slab-free-RCU. > > Either way, it's much easier to compare pros and cons of each, when > they are done incrementally on top of the existing base. To illustrate my point, if it needs to be spelled out. If you rcu lock a list then locking can be quite changed. Starting with my basic first transform which has inode protected by i_lock: /* pass1, dumb */ spin_lock(&inode->i_lock); spin_lock(&list_lock); list_del(&inode->list); ... spin_lock(&list_lock); list_for_each_entry(inode, list) { if (inode->blah ...) { if (!spin_trylock(&inode->i_lock)) goto repeat; /* oh no, it's unmaintainable */ do_something(inode); } } Common pattern, used by several of the inode lists, OK? /* pass2a, RCU optimised */ rcu_read_lock() list_for_each_entry(inode, list) { if (inode->blah ...) { spin_lock(&inode->i_lock); /* no trylock. easy. */ if (unlikely(list_empty(&inode->i_list))) continue; do_something(inode); } } /* pass2b, reduced i_lock width */ /* no i_lock */ spin_lock(&list_lock); list_del(&inode->list); ... spin_lock(&list_lock); list_for_each_entry(inode, list) { if (inode->blah ...) { do_something(inode); } } OK? Do you see a problem yet? Try turning into RCU. rcu_read_lock(); list_for_each_entry(inode, list) { if (inode->blah ...) { spin_lock(&list_lock); if (unlikely(list_empty(&inode->i_list))) continue; do_something(inode); } } So you've saved one (extremely fine-grained, cache hot) lock in the slowpath, but you've exchanged a fine grained, cache hot lock in the fastpath for a more expensive lock. You've also lost the really nice (for maintainability and simplicity) ability to freeze all modifications to the inode with the single i_lock. So... you just happened to decide for yourself that your approach is better, just because. All your extensive trials and experiemnts with different locking orders lead you to believe that my preferred RCU approach is wrong and yours is right? Can you please share your working out with the rest of the class? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html