On Thu, Jun 24, 2010 at 11:48:13AM +0200, Andi Kleen wrote: > npiggin@xxxxxxx writes: > > > From: Eric Dumazet <dada1@xxxxxxxxxxxxx> > > > > new_inode() dirties a contended cache line to get increasing inode numbers. > > > > Solve this problem by providing to each cpu a per_cpu variable, feeded by the > > shared last_ino, but once every 1024 allocations. > > Most file systems don't even need this because they > allocate their own inode numbers, right?. So perhaps it could be turned > off for all of those, e.g. with a superblock flag. That's right. More or less it just requires alloc_inode to be exported, adding more branches in new_inode would not be a good way to go. But I didn't want to start microoptimisations in filesystems just yet. > I guess the main customer is sockets only. I guess. Sockets and ram based filesystems. Interestingly I don't know really what it's for (in socket code it's mostly for reporting and hashing it seems). It sure isn't guaranteed to be unique. Anyway it's outside the scope of this patchset to change functionality at all. > > +#ifdef CONFIG_SMP > > +/* > > + * Each cpu owns a range of 1024 numbers. > > + * 'shared_last_ino' is dirtied only once out of 1024 allocations, > > + * to renew the exhausted range. > > + * > > + * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW > > + * error if st_ino won't fit in target struct field. Use 32bit counter > > + * here to attempt to avoid that. > > I don't understand how the 32bit counter should prevent that. Well I think glibc will convert 64 bit stat struct to 32bit for old apps. It detects if the ino can't fit in 32 bits. > > +static DEFINE_PER_CPU(int, last_ino); > > +static atomic_t shared_last_ino; > > With the 1024 skip, isn't overflow much more likely, just scaling > with the number of CPUs on a large CPU number systems, even if there > aren't that many new inodes? Well EOVERFLOW should never happen with only the low 32 significant bits set in the inode. If you are worried about wrapping the counter, then no I don't think it is much more likely. Because each CPU will only reserve another 1024 inode interval after it has already allocated 1024 numbers. So the most wastage you will get is (1024-1)*NR_CPUS -- somewhere around 1/1000th of the available range. I guess overflow will be more common now because it will be possible to allocate inodes much faster on such a huge machine :) > > +static int last_ino_get(void) > > +{ > > + int *p = &get_cpu_var(last_ino); > > + int res = *p; > > + > > + if (unlikely((res & 1023) == 0)) > > + res = atomic_add_return(1024, &shared_last_ino) - 1024; > > The magic numbers really want to be defines? Sure OK. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html