On 12/02/21 22:58, Peter Zijlstra wrote:
But AFAI recursive #VE is entirely possible. The moment #VE reads that ve_info thing, NMIs can happen, which can trigger another #VE which then clobbers your stack and we're irrecoverably screwed again.
Yes, you need to zero the handler-active word in the info structure, and at that point recursion can happen.
A while ago Andy proposed re-enabling #VE from an interrupt, that would have worked at the time since we were concerned with asynchronous page faults but it wouldn't extend to TDX.
Unlike NMIs, however, #VE handlers can be written so that they only a single nesting happens. A few months ago, also while discussing #VE for asynchronous page faults, I came up with a scheme that did exactly that and handled recursion by flipping the IST between two stacks (https://lkml.org/lkml/2020/5/15/1239). It should work and it'd be almost entirely C code, but I don't expect you or Thomas to be ecstatic about it...
(also, inhibiting NMI is a seriously dodgy hack, the very last thing x86 needs is is more ductape on the recursion rules)
I can't disagree about that, but then again I don't see many alternatives. Paolo