Re: [PATCH] parisc: Fix spinlock barriers

John David Anglin <dave.anglin@xxxxxxxx> · Fri, 10 Jul 2020 21:46:05 -0400

Hi Jeroen,

On 2020-07-10 6:50 p.m., Jeroen Roovers wrote:
> On Thu, 9 Jul 2020 09:39:33 -0400
> John David Anglin <dave.anglin@xxxxxxxx> wrote:
>
>> On 2020-07-09 9:26 a.m., Rolf Eike Beer wrote:
>>> Am Freitag, 3. Juli 2020, 22:32:35 CEST schrieb John David Anglin:  
>>>> Stalls are quite frequent with recent kernels.  When the stall is
>>>> detected by rcu_sched, we get a backtrace similar to the
>>>> following:  
>>> With this patch on top of 5.7.7 I still get:  
>> Suggest enabling CONFIG_LOCKUP_DETECTOR=y and
>> CONFIG_SOFTLOCKUP_DETECTOR=y so we can see where the stall occurs.
>>
>> Dave
>>
> Attached is kernel output while running the futex_requeue_pi test from
> the kernel selftests. It failed this way on the second try while it
> passed on the first try. The output it gave is with the kernel
> configuration options as set out above.
Unfortunately, the soft lockup detector didn't trigger in the output you attached.  So, it's not
clear where futex_requeue_p is stuck.  There are no spinlocks  in check_preempt_curr() that
I can see.

void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
        const struct sched_class *class;

        if (p->sched_class == rq->curr->sched_class) {
                rq->curr->sched_class->check_preempt_curr(rq, p, flags);
        } else {
                for_each_class(class) {
                        if (class == rq->curr->sched_class)
                                break;
                        if (class == p->sched_class) {
                                resched_curr(rq);
                                break;
                        }
                }
        }

        /*
         * A queue event has occurred, and we're going to schedule.  In
         * this case, we can save a useless back to back clock update.
         */
        if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
                rq_clock_skip_update(rq, true);
}

There's one loop in the above code.

I have CONFIG_PREEMPT_NONE=y in my kernel builds.

Regards,
Dave

-- 
John David Anglin  dave.anglin@xxxxxxxx