On Thu, Dec 09, 2010 at 05:16:44PM +1100, Nick Piggin wrote: > On Thu, Dec 09, 2010 at 04:43:43PM +1100, Dave Chinner wrote: > > On Mon, Nov 29, 2010 at 09:57:33PM +1100, Nick Piggin wrote: > > > Hey, > > > > > > What was the reason behind not using my approach to use fast per-cpu > > > counters for inode and dentry counters, and instead using the > > > percpu_counter lib (which is not useful unless very fast approximate > > > access to the global counter is required, or performance is not > > > critical, which is somewhat of an oxymoron if you're using per-counters > > > in the first place). It is a difference between this: > > > > Hi Nick - sorry for being slow to answer this - I only just found > > this email. > > > > The reason for using the generic counters is because the shrinkers > > read the current value of the global counter on every call and hence > > they can be read thousands of times a second. The only way to do that > > efficiently is to use the approximately value the generic counters > > provide. > > That is not what is happening, though, so I assume that no measurements > were done. > > In fact what happens now is that *both* type of counters use the crappy > percpu counter library, and the shrinkers actually do a per-cpu loop > over the counters to get the sum. > > But anyway even if that operation was fast, it is silly to use a per > cpu counter for nr_unused, because it is tied fundamentally to the LRU, > so you can't get any more scalability than the LRU operations anyway! > > I'm all for breaking out patches and pulling things ahead where they > make sense, but it seems like things have just been done without much > thought or measurements or any critical discussion of why changes were > made. > > There wasn't even any point making the total counter per-cpu yet either, > seeing as there is still a lot of global locking in there it would not > have made any difference to scalability, and only slowed things down. > > What it _should_ look like is exactly what I had in my tree. Proper, > fast total object counters with a per-cpu loop for the sum when the > global locks in the create/destroy path are lifted; with per-LRU counter > for nr_unused counter which is protected together with lru lock. In fact, I should revise my regression fix to go back to global LRU counters until per-zone LRUs are implemented. Sigh, yay for more tree breakage. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html