On Sun, Sep 01, 2013 at 03:48:01PM -0700, Linus Torvalds wrote: > I made DEFINE_LGLOCK use DEFINE_PER_CPU_SHARED_ALIGNED for the > spinlock, so that each local lock gets its own cacheline, and the > total loops jumped to 62M (from 52-54M before). So when I looked at > the numbers, I thought "oh, that helped". > > But then I looked closer, and realized that I just see a fair amount > of boot-to-boot variation anyway (probably a lot to do with cache > placement and how dentries got allocated etc). And it didn't actually > help at all, the problem is stilte there, and lg_local_lock is still > really really high on the profile, at 8% cpu time: > > - 8.00% lg_local_lock > - lg_local_lock > + 64.83% mntput_no_expire > + 33.81% path_init > + 0.78% mntput > + 0.58% path_lookupat > > which just looks insane. And no, no lg_global_lock visible anywhere.. > > So it's not false sharing. But something is bouncing *that* particular > lock around. Hrm... It excludes sharing between the locks, all right. AFAICS, that won't exclude sharing with plain per-cpu vars, will it? Could you tell what vfsmount_lock is sharing with on that build? The stuff between it and files_lock doesn't have any cross-CPU writers, but with that change it's the stuff after it that becomes interesting... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html