Re: Inode Lock Scalability V4

Dave Chinner <david@xxxxxxxxxxxxx> · Sun, 17 Oct 2010 17:10:42 +1100

On Sun, Oct 17, 2010 at 01:55:33PM +1100, Nick Piggin wrote:
> On Sun, Oct 17, 2010 at 01:47:59PM +1100, Dave Chinner wrote:
> > On Sun, Oct 17, 2010 at 04:55:15AM +1100, Nick Piggin wrote:
> > > On Sat, Oct 16, 2010 at 07:13:54PM +1100, Dave Chinner wrote:
> > > > This patch set is just the basic inode_lock breakup patches plus a
> > > > few more simple changes to the inode code. It stops short of
> > > > introducing RCU inode freeing because those changes are not
> > > > completely baked yet.
> > > 
> > > It also doesn't contain per-zone locking and lrus, or scalability of
> > > superblock list locking.
> > 
> > Sure - that's all explained in the description of what the series
> > actually contains later on. 
> > 
> > > And while the rcu-walk path walking is not fully baked, it has been
> > > reviewed by Linus and is in pretty good shape. So I prefer to utilise
> > > RCU locking here too, seeing as we know it will go in.
> > 
> > I deliberately left out the RCU changes as we know that the version
> > that is in your tree causes siginificant performance regressions for
> > single threaded and some parallel workloads on small (<=8p)
> > machines.
> 
> The worst-case microbenchmark is not a "significant performance
> regression". It is a worst case demonstration. With the parallel
> workloads, are you referring to your postmark xfs workload? It was
> actually due to lazy LRU, IIRC.

Actually, I wasn't refering to the regressions I reported from
fs_mark runs on XFS - I was refering to your "worse case
demonstration" numbers and the comments made during the discussion
that followed.  It wasn't clear to me what the plan was to use
SLAB_DESTROY_BY_RCU or not and the commit messages didn't help,
so I left it out because I was not about to bite off more than I
could chew for .37.

As it is, the lazy LRU code doesn't appear to cause any fs_mark
performance regressions in the testing I've done of my series on
either ext4 or XFS. Hence I don't think that was the cause of any of
the performance problems I originally measured using fs_mark.

And you are right that it wasn't RCU overhead, because....

> I didn't think RCU overhead was noticable there actually.

.... I later noticed you never converted the XFS inode cache to use
RCU inode freeing. Which means that none of the RCU tree walks
are actually protected by RCU when XFS is used with your tree.
Maybe that was causing problems.

But if it's not RCU freeing (or lack thereof) or lazy LRU, it's one
of the other scalability patches that I left out of my series that
was causing the problem.

> Anyway, I've already gone over this couple of months ago when we
> were discussing it. We know it could cause some small regressions,
> if they are small it is considered acceptable and outweighed
> greatly by fastpath speedup. And I have a design to do slab RCU
> which can be used if regressions are large. Linus signed off on
> this, in fact. Why weren't you debating it then?

I try not to debate stuff I don't understand or have no information
about. That discussion is where I first learnt about the existence
of SLAB_DESTROY_BY_RCU. Clueless is not a great position to start
from in a discussion with Linus...

Anyway, that is ancient history. Now I've got patches to convert the
XFS inode cache to use RCU freeing via SLAB_DESTROY_BY_RCU thanks to
what I learnt from that discussion.  The patches don't show any
performance degradation at up to 16p in the benchmarking I've done
so far when combined with the the inode-scale series and the .37 XFS
queue. Hence I think XFS will be ready to go for RCU freed inodes in
.38 regardless of whether the VFS gets there or not.

And as a result of XFS being able to implement this functionality
independently of the VFS, I'm completely ambivialent as to how the
VFS goes about implementing RCU inode freeing. If the VFS
maintainers want to go straight to using SLAB_DESTROY_BY_RCU to
minimise the worst case overhead, then that's what I'll to do...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html