On Sat, Oct 4, 2014 at 1:13 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Fri, Oct 03, 2014 at 02:15:24PM -0700, Andy Lutomirski wrote: >> On Fri, Oct 3, 2014 at 2:12 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: >> > On Fri, Oct 03, 2014 at 02:04:53PM -0700, Andy Lutomirski wrote: >> >> On Fri, Oct 3, 2014 at 2:02 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: >> > >> >> > Something like so.. slightly less ugly and possibly with more >> >> > complicated conditions setting the cr4 if you want to fix tsc vs seccomp >> >> > as well. >> >> >> >> This will crash anything that tries rdpmc in an allow-everything >> >> seccomp sandbox. It's also not very compatible with my grand scheme >> >> of allowing rdtsc to be turned off without breaking clock_gettime. :) >> > >> > Well, we clear cap_user_rdpmc, so everybody who still tries it gets what >> > he deserves, no problem there. >> >> Oh, interesting. >> >> To continue playing devil's advocate, what if you do perf_event_open, >> then mmap it, then start the seccomp sandbox? > > We update that cap bit on every update to the self-monitor state, and in > a perfect world people would also check the cap bit every time they try > and read it, and fall back to the syscall. So we could just clear it.. > but I can imagine reality ruining things here. If nothing else, the fact that rdpmc fails with SIGSEGV instead of with some nonsense value means that this will always be racy. > >> My draft patches are currently tracking the number of perf_event mmaps >> per mm. I'm not thrilled with it, but it's straightforward. And I >> still need to benchmark cr4 writes, which is tedious, because I can't >> do it from user code. > > Should be fairly straight fwd from kernel space, get a tsc stamp, > read+write cr4 1000 times, get another tsc read, and maybe do that > several times. No? I tried it. Rough numbers on my 2.7 GHz Sandy Bridge laptop Writing to cr4 in VMX non-root (changing PCE) takes ~48ns. RMW cr4 takes rougly 51ns. IMO neither of these is enough to be worth worrying *that* much about when switching into or out of a perf-using task. But you might disagree with me. Changing TSD takes 700ns, because KVM has the VMCS programmed wrong. I'll send a patch. I suspect that the same experiment on bare metal would run faster. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html