Re: [patch 1/6] fs: icache RCU free inodes

Nick Piggin <npiggin@xxxxxxxxx> · Fri, 12 Nov 2010 17:49:11 +1100

On Fri, Nov 12, 2010 at 05:02:02PM +1100, Nick Piggin wrote:
> On Thu, Nov 11, 2010 at 08:48:38PM -0800, Linus Torvalds wrote:
> > On Thu, Nov 11, 2010 at 5:24 PM, Nick Piggin <npiggin@xxxxxxxxx> wrote:
> > >
> > > So this is really not a "oh, maybe someone will see 10-20% slowdown", or even
> > > 1-2% slowdown.
> > 
> > You ignored my bigger issue: the _normal_ way - and the better way -
> > to handle these thingsis with SLAB_DESTROY_BY_RCU.
> 
> Well I tried to answer that in the other threads.
> 
> SLAB_DESTROY_BY_RCU is indeed quite natural for a lot of RCU usages,
> because even with standard RCU you almost always have the pattern like
> 
> rcu_read_lock();
> obj = lookup_data_structure(key);
> if (obj) {
>   lock(obj);
>   verify_obj_in_structure(obj, key);
>   /* blah... (eg. take refcount) */
> }
> 
> And in this pattern, SLAB_DESTROY_BY_RCU takes almost zero work.
> 
> OK, but rcu-walk doesn't have that. In rcu-walk, we can't take a lock
> or a reference on either the dentry _or_ the inode, because the whole
> point is to reduce atomics (for single threaded performance), and
> stores into shared cachelines along the path (for scalability).
> 
> It gets really interesting when you have crazy stuff going on like
> inode->i_ops changing from underneath you while you're trying to do
> permission lookups, or inode type changing from link to dir to reg
> in the middle of the traversal.
> 
> 
> > So what are the advantages of using the inferior approach?  I really
> > don't see why you push the whole "free the damn things individually"
> > approach.
> 
> I'm not pushing _that_ aspect of it. I'm pushing the "don't go away and
> come back as something else" aspect.
> 
> Yes it may be _possible_ to do store-free walking SLAB_DESTROY_BY_RCU,
> and I have some ideas. But it is hairy. More hairy than rcu-walk, by
> quite a long shot.

So in short, that is my justification. 12% is a worst case regression,
but the demonstration is obviously absurdly worst case, and is merely
there as a "ok, the sky won't fall on anybody's head" upper bound.

In reality, it's likely to be well under 0.1% in any real workload, even
an inode intensive one. So I much prefer to err on the side of less
complexity, to start with. There just isn't much risk of regression
AFAIKS, and much more risk of becoming unmaintainable too complex.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html