On Fri, Apr 09, 2021 at 05:05:58PM -0500, Madhavan T. Venkataraman wrote: > > FWIW, over the years we've had zero issues with encoding the frame > > pointer on x86. After you save pt_regs, you encode the frame pointer to > > point to it. Ideally in the same macro so it's hard to overlook. > > > > I had the same opinion. In fact, in my encoding scheme, I have additional > checks to make absolutely sure that it is a true encoding and not stack > corruption. The chances of all of those values accidentally matching are, > well, null. Right, stack corruption -- which is already exceedingly rare -- would have to be combined with a miracle or two in order to come out of the whole thing marked as 'reliable' :-) And really, we already take a similar risk today by "trusting" the frame pointer value on the stack to a certain extent. > >> I think there's a lot more code that we cannot unwind, e.g. KVM > >> exception code, or almost anything marked with SYM_CODE_END(). > > > > Just a reminder that livepatch only unwinds blocked tasks (plus the > > 'current' task which calls into livepatch). So practically speaking, it > > doesn't matter whether the 'unreliable' detection has full coverage. > > The only exceptions which really matter are those which end up calling > > schedule(), e.g. preemption or page faults. > > > > Being able to consistently detect *all* possible unreliable paths would > > be nice in theory, but it's unnecessary and may not be worth the extra > > complexity. > > > > You do have a point. I tried to think of arch_stack_walk_reliable() as > something that should be implemented independent of livepatching. But > I could not really come up with a single example of where else it would > really be useful. > > So, if we assume that the reliable stack trace is solely for the purpose > of livepatching, I agree with your earlier comments as well. One thought: if folks really view this as a problem, it might help to just rename things to reduce confusion. For example, instead of calling it 'reliable', we could call it something more precise, like 'klp_reliable', to indicate that its reliable enough for live patching. Then have a comment above 'klp_reliable' and/or stack_trace_save_tsk_klp_reliable() which describes what that means. Hm, for that matter, even without renaming things, a comment above stack_trace_save_tsk_reliable() describing the meaning of "reliable" would be a good idea. -- Josh