On Sat, 16 Oct 2010 10:29:08 +0200 Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > Le samedi 16 octobre 2010 __ 18:55 +1100, Nick Piggin a __crit : > > On Thu, Sep 30, 2010 at 04:10:39PM +1000, Dave Chinner wrote: > > > On Wed, Sep 29, 2010 at 09:53:22PM -0700, Andrew Morton wrote: > > > > On Wed, 29 Sep 2010 22:18:48 +1000 Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > > > > From: Eric Dumazet <dada1@xxxxxxxxxxxxx> > > > > > > > > > > The number of inodes allocated does not need to be tied to the > > > > > addition or removal of an inode to/from a list. If we are not tied > > > > > to a list lock, we could update the counters when inodes are > > > > > initialised or destroyed, but to do that we need to convert the > > > > > counters to be per-cpu (i.e. independent of a lock). This means that > > > > > we have the freedom to change the list/locking implementation > > > > > without needing to care about the counters. > > > > > > > > > > > > > > > ... > > > > > > > > > > +int get_nr_inodes(void) > > > > > +{ > > > > > + int i; > > > > > + int sum = 0; > > > > > + for_each_possible_cpu(i) > > > > > + sum += per_cpu(nr_inodes, i); > > > > > + return sum < 0 ? 0 : sum; > > > > > +} > > > > > > > > This reimplements percpu_counter_sum_positive(), rather poorly > > > > Why is it poorly? > > Nick > > Some people believe percpu_counter object is the right answer to such > distributed counters, because the loop is done on 'online' cpus instead > of 'possible' cpus. "It must be better if number of possible cpus is > 4096 and only one or two cpus are online"... > > But if we do this loop only on rare events, like > "cat /proc/sys/fs/inode-nr", then the percpu_counter() is more > expensive, because percpu_add() _is_ more expensive : > > - Its a function call and lot of instructions/cycles per call, while > this_cpu_inc(nr_inodes) is a single instruction, using no register on > x86. You want an inlined percpu_counter_inc() then write one! Bonus points for writing this_cpu_add_return() and doing it without a preempt_disable(). It collapses to just a few instructions. It's extremely poor form to say "oh X sucks so I'm going to implement my own" without first addressing why X allegedly sucks. > - Its possibly accessing a shared spinlock and counter when the percpu > counter reaches the batch limit. That's in the noise foor. > To recap : nr_inodes is not a counter that needs to be estimated in real > time, since we have not limit on number of inodes in the machine (limit > is the memory allocator). > > Unless someone can prove "cat /proc/sys/fs/inode-nr" must be performed > thousand of times per second on their setup, the choice I made to scale > nr_inodes is better over the 'obvious percpu_counter choice' > > This choice was made to scale some counters in network stack some years > ago, and this rocks. And we get open-coded reimplementations of the same damn thing all over the tree. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html