Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > Suppose some kernel code (a syscall or kernel thread) changes PKRS > then takes a page fault. The page fault handler needs a fresh > PKRS. Then the page fault handler (say a VMA’s .fault handler) changes > PKRS. The we get an interrupt. The interrupt *also* needs a fresh > PKRS and the page fault value needs to be saved somewhere. > > So we have more than one saved value per thread, and thread_struct > isn’t going to solve this problem. A stack of 7 entries and an index needs 32bytes total which is a reasonable amount and solves the problem including scheduling from #PF nicely. Make it 15 and it's still only 64 bytes. > But idtentry_state is also not great for a couple reasons. Not all > entries have idtentry_state, and the unwinder can’t find it for > debugging. For that matter, the page fault logic probably wants to > know the previous PKRS, so it should either be stashed somewhere > findable or it should be explicitly passed around. > > My suggestion is to enlarge pt_regs. The save and restore logic can > probably be in C, but pt_regs is the logical place to put a register > that is saved and restored across all entries. Kinda, but that still sucks because schedule from #PF will get it wrong unless you do extra nasties. > Whoever does this work will have the delightful job of figuring out > whether BPF thinks that the layout of pt_regs is ABI and, if so, > fixing the resulting mess. > > The fact the new fields will go at the beginning of pt_regs will make > this an entertaining prospect. Good luck with all of that. Thanks, tglx