On December 20, 2022 1:55:31 AM PST, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx> wrote: >On 20/12/2022 9:45 am, Peter Zijlstra wrote: >> On Mon, Dec 19, 2022 at 10:36:48PM -0800, Xin Li wrote: >> >>> + wrmsrl(MSR_IA32_FRED_STKLVLS, >>> + FRED_STKLVL(X86_TRAP_DB, 1) | >>> + FRED_STKLVL(X86_TRAP_NMI, 2) | >>> + FRED_STKLVL(X86_TRAP_MC, 2) | >>> + FRED_STKLVL(X86_TRAP_DF, 3)); >>> + >>> + /* The FRED equivalents to IST stacks... */ >>> + wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB)); >>> + wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI)); >>> + wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF)); >> Not quite.. IIRC fred only switches to another stack when the level of >> the exception is higher. Specifically, if we trigger #DB while inside >> #NMI we will not switch to the #DB stack (since 1 < 2). > >There needs to be a new stack for #DF, and just possibly one for #MC. >NMI and #DB do not need separate stacks under FRED. > >> Now, as mentioned elsewhere, it all nests a lot saner, but stack >> exhaustion is still a thing, given the above, what happens when a #DB >> hits an #NMI which tickles a #VE or something? >> >> I don't think we've increased the exception stack size, but perhaps we >> should for FRED? > >Not sure if it matters too much - it doesn't seem usefully different to >IDT delivery. #DB shouldn't get too deep, and NMI gets properly >inhibited now. > >~Andrew > I still don't think you want to take #DB or – especially – NMI on the task stack while in the kernel. In fact, the plan is to get rid of the software irqstack handling, too, but at tglx's request that will be a later changeset (correctness first, then optimization.)