Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chain support to use NMI-safe methods

Ingo Molnar <mingo@xxxxxxx> · Mon, 15 Jun 2009 20:08:29 +0200

* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Then, the NMI handler would be changed to always write that value 
> to %cr2 after it has done the operation that could fault, and do 
> an atomic increment of the NMI sequence count. Then, we can do 
> something like this in the page fault handler:
> 
> 	if (cr2 == MAGIC_CR2) {
> 		static unsigned long my_seqno = -1;
> 		if (my_seqno != nmi_seqno) {
> 			my_seqno = nmi_seqno;
> 			return;
> 		}
> 	}
> 
> where the whole (and only) point of that "seqno" is to protect against 
> user space doing something like
> 
> 	int i = *(int *)MAGIC_CR2;
> 
> and causing infinite faults.

Heh - this is so tricky that it's disgusting! Lovely.

And, since this appears to be a competition of sick ideas, an even 
more disgusting hack might be to write to the IDT from the NMI 
handler, and install a NULL entry at #PF and rely on the double 
fault handler to detect faults - double faults dont clobber the cr2 
i think ...

( I think to protect the fragile and pure fabric of lkml against
  moral corruption, disgusting patches must remain unsent and
  disgusting ideas like this must absolutely stay unspoken. Hence
  i have removed lkml from the Cc:. [Oops i didnt ... too late, 
  and this mail has already been sent! :-/ ])

> If a real NMI happens, then nmi_seqno will always be different, 
> and we'll just retry the fault (the NMI handler would do something 
> like
> 
> 	write_cr2(MAGIC_CR2);
> 	atomic_inc(&nmi_seqno);
> 
> to set it all up).
> 
> Anyway, I do think that the _correct_ solution is to not do page 
> faults from within NMI's, but the above is an outline of how we 
> could _try_ to handle it if we really really wanted to. IOW, the 
> fact that cr2 gets corrupted is not insurmountable, exactly 
> because we _could_ always just retrigger the page fault, and thus 
> "re-create' the corrupted %cr2 value.
> 
> Hacky, hacky. And I'm not sure how happy CPU's even are to have 
> %cr2 written to, so we could hit CPU issues.

If cr2 cannot be safely written to on a CPU, that could be worked 
around by counting the number of NMIs via a 
percpu_add(this_nmi_count, 1) and retrying faults if any NMI 
happened between the previous fault and this fault.

This has the disadvantage of potentially doubling the number of 
pagefaults though. But it would certainly work as a tricky quirk to 
this quirk which is added to a rather quirky code-path to begin 
with.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html