Re: [PATCH] x86/traps: Don't for in_interrupt() to return true in IST handlers

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Tue, 24 May 2016 08:43:57 -0700

On Tue, May 24, 2016 at 1:59 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, May 23, 2016 at 08:57:05PM -0700, Andy Lutomirski wrote:
>> Forcing in_interrupt() to return true if we're not in a bona fide
>> interrupt confuses the softirq code.  This fixes warnings like:
>>
>> NOHZ: local_softirq_pending 282
>>
>> that can happen when running things like selftests/x86.
>>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Fixes: 959274753857 ("x86, traps: Track entry into and exit from IST context")
>> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
>
>> +/*
>> + * We want to cause in_atomic() to return true while in an IST handler
>> + * so that attempts to schedule will warn.
>> + *
>> + * We cannot add use HARDIRQ_OFFSET or otherwise cause in_interrupt() to
>> + * return true: the softirq code assumes that in_interrupt() only
>> + * returns true if we will soon execute softirqs, and we can't do that
>> + * if an IST entry interrupts kernel code with interrupts disabled.
>> + *
>> + * Using 3 * PREEMPT_OFFSET instead of just PREEMPT_OFFSET is pure
>> + * paranoia.
>> + */
>> +#define IST_OFFSET (3 * PREEMPT_OFFSET)
>
> So this has implications for code like
> kernel/events/internal.h:get_recursion_context() and
> kernel/trace/trace.c:get_trace_buf().
>
> Which use a sequence of: in_nmi(), in_irq(), in_softirq() to pick 1 out
> of 4 possible contexts.
>
> I would really like the Changelog to reflect on this. The current code
> will have ISTs land in in_irq(), with this chance, not so much.

I can change the changelog.

>
> Now ISTs as a whole invalidate the whole 'we have only 4 contexts' and
> the mapping back to those 4 is going to be somewhat arbitrary I suspect,
> but changes like this should be very much aware of these things. And
> make an explicit choice.

I'm not so comfortable with trying to make any particular guarantees
about what all the in_xyz() things will return for different entry
types and how they nest.  For example, debug can nest inside itself
quite easily (at one point I even had a user program to force it to
happen) -- this can trigger when a watchpoint nests inside a
single-step trap, and it can also happen when a watchpoint handler is
interrupted by an NMI than then recursively triggers the watchpoint.
The latter could easily result in nested NMIs that are directly
visible to the trace or event code.

On x86, there's also MCE, which is NMI-ish, and NMI and MCE can freely
nest inside each other.  (Blech.)

Would it make more sense to adjust the trace code to have a percpu
nesting count and to match up get_trace_buf with put_trace_buf to
decrement the count?  The event code looks like the same thing could
happen.

Also, on further reflection, I'm going to get rid of the 3* hack.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html