On 04/10/2014 12:51 PM, Andi Kleen wrote: > Richard Yao <ryao@xxxxxxxxxx> writes: > >> Performance analysis of software compilation by Gentoo portage on an >> Intel E5-2620 with 64GB of RAM revealed that a sizeable amount of time, >> anywhere from 5% to 15%, was spent in get_vmalloc_info(), with at least >> 40% of that time spent in the _raw_spin_lock() invoked by it. > > I don't think that's the right fix. We want to be able > to debug kernels without having to recompile them. There are plenty of other features for debugging the VM subsystem that are disabled in production kernels because they are too expensive. I see no reason why this should not be one of them. If someone reading this has a use for this functionality in production systems, I would love to hear about it. I am having trouble finding uses for this in production. That being said, we are clearly spending plenty of time blocked on list traversal. I imagine that we could use an extent tree to track free space for even bigger gains, but I have difficulty seeing why /proc/vmallocinfo should be available on non-debug kernels. Allowing userland to hold a critical lock indefinitely on production systems is a deadlock waiting to happen. > And switching locking around dynamically like this is very > ugly and hard to maintain. I welcome suggestions on how to make the changes I have made in this patch more maintainable. > Besides are you sure the spin lock is not needed elsewhere? > > How are writers to the list protected? The spinlock is needed elsewhere, but not to protect this list. Modifications to this list are done under RCU. The only thing stopping RCU from being enough to avoid a spinlock is /proc/vmallocinfo, which does locking to prevent modification while userland is reading the list.
Attachment:
signature.asc
Description: OpenPGP digital signature