Re: [patch] fs: use fast counters for vfs caches

Nick Piggin <npiggin@xxxxxxxxx> · Thu, 9 Dec 2010 17:40:28 +1100

On Thu, Dec 09, 2010 at 05:16:44PM +1100, Nick Piggin wrote:
> On Thu, Dec 09, 2010 at 04:43:43PM +1100, Dave Chinner wrote:
> > On Mon, Nov 29, 2010 at 09:57:33PM +1100, Nick Piggin wrote:
> > > Hey,
> > > 
> > > What was the reason behind not using my approach to use fast per-cpu
> > > counters for inode and dentry counters, and instead using the
> > > percpu_counter lib (which is not useful unless very fast approximate
> > > access to the global counter is required, or performance is not
> > > critical, which is somewhat of an oxymoron if you're using per-counters
> > > in the first place). It is a difference between this:
> > 
> > Hi Nick - sorry for being slow to answer this - I only just found
> > this email.
> > 
> > The reason for using the generic counters is because the shrinkers
> > read the current value of the global counter on every call and hence
> > they can be read thousands of times a second. The only way to do that
> > efficiently is to use the approximately value the generic counters
> > provide.
> 
> That is not what is happening, though, so I assume that no measurements
> were done.
> 
> In fact what happens now is that *both* type of counters use the crappy
> percpu counter library, and the shrinkers actually do a per-cpu loop
> over the counters to get the sum.
> 
> But anyway even if that operation was fast, it is silly to use a per
> cpu counter for nr_unused, because it is tied fundamentally to the LRU,
> so you can't get any more scalability than the LRU operations anyway!
> 
> I'm all for breaking out patches and pulling things ahead where they
> make sense, but it seems like things have just been done without much
> thought or measurements or any critical discussion of why changes were
> made.
> 
> There wasn't even any point making the total counter per-cpu yet either,
> seeing as there is still a lot of global locking in there it would not
> have made any difference to scalability, and only slowed things down.
> 
> What it _should_ look like is exactly what I had in my tree.  Proper,
> fast total object counters with a per-cpu loop for the sum when the
> global locks in the create/destroy path are lifted; with per-LRU counter
> for nr_unused counter which is protected together with lru lock.

In fact, I should revise my regression fix to go back to global LRU
counters until per-zone LRUs are implemented. Sigh, yay for more tree
breakage.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html