Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Sat, 15 May 2021 21:06:01 +0200

On Sat, May 15 2021 at 15:09, Peter Zijlstra wrote:
> On Sat, May 15, 2021 at 01:23:02AM +0200, Thomas Gleixner wrote:
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -691,7 +691,9 @@ void flush_smp_call_function_from_idle(v
>>  	cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->idle, CFD_SEQ_NOCPU,
>>  		      smp_processor_id(), CFD_SEQ_IDLE);
>>  	local_irq_save(flags);
>> +	lockdep_set_softirq_raise_safe();
>>  	flush_smp_call_function_queue(true);
>> +	lockdep_clear_softirq_raise_safe();
>>  	if (local_softirq_pending())
>>  		do_softirq();
>
> I think it might make more sense to raise hardirq_count() in/for
> flush_smp_call_function_queue() callers that aren't already from hardirq
> context. That's this site and smpcfd_dying_cpu().
>
> Then we can do away with this new special case.

Right.

Though I just checked smpcfd_dying_cpu(). That ones does not run
softirqs after flushing the function queue and it can't do that because
that's in the CPU dying phase with interrupts disabled where the CPU is
already half torn down.

Especially as softirq processing enables interrupts, which might cause
even more havoc.

Anyway how is it safe to run arbitrary functions there after the CPU
removed itself from the online mask? That's daft to put it mildly.

Thanks,

        tglx