On Fri, Apr 17, 2015 at 2:28 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, Apr 17, 2015 at 4:39 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> >> On my box, vclock_gettime using kvm-clock is about 40 ns. An empty >> syscall is about 33 ns. clock_gettime *should* be around 17 ns. >> >> The clock_gettime syscall is about 73 ns. >> >> Could we figure out why clock_gettime (the syscall) is so slow > > If we only could profile some random program (let's call it "a.out" > that did the syscall(__NR_gettime_syscall) a couple million times. > > Oh, lookie here, Santa came around: > > 21.83% [k] system_call > 12.85% [.] syscall > 9.76% [k] __audit_syscall_exit > 9.55% [k] copy_user_enhanced_fast_string > 4.68% [k] __getnstimeofday64 > 4.08% [k] syscall_trace_enter_phase1 > 3.85% [k] __audit_syscall_entry > 3.77% [k] unroll_tree_refs > 3.15% [k] sys_clock_gettime > 2.92% [k] int_very_careful > 2.73% [.] main > 2.35% [k] syscall_trace_leave > 2.28% [k] read_tsc > 1.73% [k] int_restore_rest > 1.73% [k] int_with_check > 1.48% [k] syscall_return > 1.32% [k] dput > 1.24% [k] system_call_fastpath > 1.21% [k] syscall_return_via_sysret > 1.21% [k] tracesys > 0.81% [k] do_audit_syscall_entry > 0.80% [k] current_kernel_time > 0.73% [k] getnstimeofday64 > 0.68% [k] path_put > 0.66% [k] posix_clock_realtime_get > 0.61% [k] int_careful > 0.60% [k] mntput > 0.49% [k] kfree > 0.36% [k] _copy_to_user > 0.31% [k] int_ret_from_sys_call_irqs_off > > looks to me like it's spending a lot of time in system call auditing. > Which makes no sense to me, since none of this should be triggering > any auditing. And there's a lot of time in low-level kernel system > call assembly code. Muahaha. The auditors have invaded your system. (I did my little benchmark with a more sensible configuration -- see way below). Can you send the output of: # auditctl -s # auditctl -l Are you, perchance, using Fedora? I lobbied rather heavily, and successfully, to get the default configuration to stop auditing. Unfortunately, the fix wasn't retroactive, so, unless you have a very fresh install, you might want to apply the fix yourself: https://fedorahosted.org/fesco/ticket/1311 > > If I only remembered the name of the crazy person who said "Ok" when I > suggest he just be the maintainer of the code since he has spent a lot > of time sending patches for it. Something like Amdy Letorsky. No, that > wasn't it. Hmm. It's on the tip of my tongue. > > Oh well. Maybe somebody can remember the guys name. It's familiar for > some reason. Andy? Amdy Lumirtowsky thinks he meant to attach a condition to his maintainerish activities: he will do his best to keep the audit code *out* of the low-level stuff, but he's going to try to avoid ever touching the audit code itself, because if he ever had to change it, he might accidentally delete the entire file. Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some point? I would love auditing to set some really loud global warning that you've just done a Bad Thing (tm) performance-wise by enabling it. Back to timing. With kvm-clock, I see: 23.80% timing_test_64 [kernel.kallsyms] [k] pvclock_clocksource_read 15.57% timing_test_64 libc-2.20.so [.] syscall 12.39% timing_test_64 [kernel.kallsyms] [k] system_call 12.35% timing_test_64 [kernel.kallsyms] [k] copy_user_generic_string 10.95% timing_test_64 [kernel.kallsyms] [k] system_call_after_swapgs 7.35% timing_test_64 [kernel.kallsyms] [k] ktime_get_ts64 6.20% timing_test_64 [kernel.kallsyms] [k] sys_clock_gettime 3.62% timing_test_64 [kernel.kallsyms] [k] system_call_fastpath 2.08% timing_test_64 timing_test_64 [.] main 1.72% timing_test_64 timing_test_64 [.] syscall@plt 1.58% timing_test_64 [kernel.kallsyms] [k] kvm_clock_get_cycles 1.22% timing_test_64 [kernel.kallsyms] [k] _copy_to_user 0.65% timing_test_64 [kernel.kallsyms] [k] posix_ktime_get_ts 0.13% timing_test_64 [kernel.kallsyms] [k] apic_timer_interrupt We've got some silly indirection, a uaccess that probably didn't get optimized very well, and the terrifying function pvclock_clocksource_read. By comparison, using tsc: 19.51% timing_test_64 libc-2.20.so [.] syscall 15.52% timing_test_64 [kernel.kallsyms] [k] system_call 15.25% timing_test_64 [kernel.kallsyms] [k] copy_user_generic_string 14.34% timing_test_64 [kernel.kallsyms] [k] system_call_after_swapgs 8.66% timing_test_64 [kernel.kallsyms] [k] ktime_get_ts64 6.95% timing_test_64 [kernel.kallsyms] [k] sys_clock_gettime 5.93% timing_test_64 [kernel.kallsyms] [k] native_read_tsc 5.12% timing_test_64 [kernel.kallsyms] [k] system_call_fastpath 2.62% timing_test_64 timing_test_64 [.] main That's better, although the uaccess silliness is still there. (No PEBS -- I don't think it works right in KVM.) --Andy > > Linus -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html