On Thu, Dec 17, 2020 at 02:07:01PM +0100, Thomas Gleixner wrote: > On Fri, Dec 11 2020 at 14:14, Andy Lutomirski wrote: > > On Mon, Nov 23, 2020 at 10:10 PM <ira.weiny@xxxxxxxxx> wrote: > > After contemplating this for a bit, I think this isn't really the > > right approach. It *works*, but we've mostly just created a bit of an > > unfortunate situation. Our stack, on a (possibly nested) entry looks > > like: > > > > previous frame (or empty if we came from usermode) > > --- > > SS > > RSP > > FLAGS > > CS > > RIP > > rest of pt_regs > > > > C frame > > > > irqentry_state_t (maybe -- the compiler is within its rights to play > > almost arbitrary games here) > > > > more C stuff > > > > So what we've accomplished is having two distinct arch register > > regions, one called pt_regs and the other stuck in irqentry_state_t. > > This is annoying because it means that, if we want to access this > > thing without passing a pointer around or access it at all from outer > > frames, we need to do something terrible with the unwinder, and we > > don't want to go there. > > > > So I propose a somewhat different solution: lay out the stack like this. > > > > SS > > RSP > > FLAGS > > CS > > RIP > > rest of pt_regs > > PKS > > ^^^^^^^^ extended_pt_regs points here > > > > C frame > > more C stuff > > ... > > > > IOW we have: > > > > struct extended_pt_regs { > > bool rcu_whatever; > > other generic fields here; > > struct arch_extended_pt_regs arch_regs; > > struct pt_regs regs; > > }; > > > > and arch_extended_pt_regs has unsigned long pks; > > > > and instead of passing a pointer to irqentry_state_t to the generic > > entry/exit code, we just pass a pt_regs pointer. > > While I agree vs. PKS which is architecture specific state and needed in > other places e.g. #PF, I'm not convinced that sticking the existing > state into the same area buys us anything more than an indirect access. > > Peter? Agreed; that immediately solves the confusion Ira had as well. While extending pt_regs sounds scary, I think we've isolated our pt_regs implementation from actual ABI pretty well, but of course, that would need an audit. We don't want to leak this into signals for example.