* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, 15 Jun 2009, Ingo Molnar wrote: > > > > See the numbers in the other mail: about 33 million pagefaults > > happen in a typical kernel build - that's ~400K/sec - and that > > is not a particularly really pagefault-heavy workload. > > Did you do any function-level profiles? > > Last I looked at it, the real cost of page faults were all in the > memory copies and page clearing, and while it would be nice to > speed up the kernel entry and exit, the few tens of cycles we > might be able to get from there really aren't all that important. Yeah. Here's the function level profiles of a typical kernel build on a Nehalem box: $ perf report --sort symbol # # (14317328 samples) # # Overhead Symbol # ........ ...... # 44.05% 0x000000001a0b80 5.09% 0x0000000001d298 3.56% 0x0000000005742c 2.48% 0x0000000014026d 2.31% 0x00000000007b1a 2.06% 0x00000000115ac9 1.83% [.] _int_malloc 1.71% 0x00000000064680 1.50% [.] memset 1.37% 0x00000000125d88 1.28% 0x000000000b7642 1.17% [k] clear_page_c 0.87% [k] page_fault 0.78% [.] is_defined_config 0.71% [.] _int_free 0.68% [.] __GI_strlen 0.66% 0x000000000699e8 0.54% [.] __GI_memcpy Most is dominated by user-space symbols. (no proper ELF+debuginfo on this box so they are unnamed.) It also sows that page clearing and pagefault handling dominates the kernel overhead - but is dwarved by other overhead. Any page-fault-entry costs are a drop in the bucket. In fact with call-chain graphs we can get a precise picture, as we can do a non-linear 'slice' set operation over the samples and filter out the ones that have the 'page_fault' pattern in one of their parent functions: $ perf report --sort symbol --parent page_fault # # (14317328 samples) # # Overhead Symbol # ........ ...... # 1.12% [k] clear_page_c 0.87% [k] page_fault 0.43% [k] get_page_from_freelist 0.25% [k] _spin_lock 0.24% [k] do_page_fault 0.23% [k] perf_swcounter_ctx_event 0.16% [k] perf_swcounter_event 0.15% [k] handle_mm_fault 0.15% [k] __alloc_pages_nodemask 0.14% [k] __rmqueue 0.12% [k] find_get_page 0.11% [k] copy_page_c 0.11% [k] find_vma 0.10% [k] _spin_lock_irqsave 0.10% [k] __wake_up_bit 0.09% [k] _spin_unlock_irqrestore 0.09% [k] do_anonymous_page 0.09% [k] __inc_zone_state This "sub-profile" shows the true summary overhead that 'page_fault' and all its child functions have. Note that for example clear_page_c decreased from 1.17% to 1.12%: 1.12% [k] clear_page_c 1.17% [k] clear_page_c because there's 0.05% of other callers to clear_page_c() that do not involve page_fault. Those are filtered out via --parent filtering/matching. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html