* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > performance i dont think we should be too worried about at this > > moment - this code is so rarely used that it should be driven by > > robustness i think. > > That really isn't true. This isn't done just once. It's done many > thousands of times. > > I agree that it has to be robust, but if we want to make > suspend/resume be instantaneous (and we do), performance does actually > matter. Yes, this is probably much less of a problem than waiting for > devices, and no, I haven't timed it, but if I counted right, we'll > literally be going almost ten thousand of these calls over a > suspend/resume cycle. > > That's not "rarely used". yeah, it's done 2800 times on my box with a distro .config. no strong feeling either way - but i dont think there's any cross-CPU TLB flush done in this case within vmap()/vunmap(). Why? Because when alternative_instructions() runs then we have just a single CPU in cpu_online_map. So i think it's only direct vmap()/vunmap() overhead, on a single CPU. We do a kmalloc/kfree which is rather fast - sub-microsecond. We install the pages in the pte's - this is rather fast as well - sub-microsecond. Even assuming cache-cold lines (which they are most of the time) and taken thousands of times that's at most a few milliseconds IMO. In fact, most of the actual vmap() related overhead should be well-cached (the kmalloc bits) - the main cost should come from trashing through all the instruction sites and modifying them. i just measured the actual costs, and the UP/SMP offline/online transition time (with Jiri's patch applied) is: # time echo 0 > /sys/devices/system/cpu/cpu1/online real 0m0.116s user 0m0.000s sys 0m0.008s # time echo 1 > /sys/devices/system/cpu/cpu1/online real 0m0.095s user 0m0.000s sys 0m0.069s with your fixmap patch: # time echo 0 > /sys/devices/system/cpu/cpu1/online real 0m0.110s user 0m0.001s sys 0m0.003s # time echo 1 > /sys/devices/system/cpu/cpu1/online real 0m0.099s user 0m0.000s sys 0m0.072s (i ran it multiple times and picked a representative run) i also did a third control run with a kernel that had alternative_instructions() disabled. The offline/online cost is: # time echo 0 > /sys/devices/system/cpu/cpu1/online real 0m0.108s user 0m0.000s sys 0m0.000s # time echo 1 > /sys/devices/system/cpu/cpu1/online real 0m0.096s user 0m0.000s sys 0m0.068s _perhaps_ there's a decrease in time but i couldnt say it for sure, because in the 'go online' case the numbers are so similar. In the go-offline case there seems to be a gradual decrease but that could be statistical noise. (The user/sys times are not reliable because most of this happens with irqs off, but the 'real' portion should be reliable.) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html