Re: [PATCH] bpf: fix rq lock recursion issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/13/22 11:09 PM, Yonghong Song wrote:


On 6/13/22 6:10 PM, Satya Durga Srinivasu Prabhala wrote:

On 6/13/22 2:49 PM, Alexei Starovoitov wrote:
On Mon, Jun 13, 2022 at 2:35 PM Satya Durga Srinivasu Prabhala
<quic_satyap@xxxxxxxxxxx> wrote:

On 6/13/22 2:01 PM, Alexei Starovoitov wrote:
is doesn't solve anything.
Please provide a reproducer.
I'm trying to find an easy way to repro the issue, so far, unsuccessful.

iirc the task's affinity change can race even with preemption disabled
on this cpu. Why would s/migrate/preemption/ address the deadlock ?
I don't think task's affinity change races with preemption disabled/enabled.

Switching to preemption disable/enable calls helps as it's just simple
counter increment and decrement with a barrier, but with migrate
disable/enable when task's affinity changes, we run into recursive bug
due to rq lock.
As Yonghong already explained, replacing migrate_disable
with preempt_disable around bpf prog invocation is not an option.

If I understand correctly, Yonghong mentioned that replacing migrate_
with preempt_ won't work for RT Kernels and migrate_ APIs were introduced
for RT Kernels is what he was pointing to. I went back and cross checked
on 5.10 LTS Kernel, I see that the migrate_ calls end up just calling into preemt_ calls [1]. So far non-RT kernels, sticking to preemt_ calls should
still work. Please let me know if I missed anything.

Yes, old kernel migrate_disable/enable() implementation with
simply preempt_disable/enable() are transitional. You can check
5.12 kernel migrate_disable/enable() implementation. Note that
your patch, if accepted, will apply to the latest kernel. So we
cannot simply replace migrate_disable() with prempt_disable(),
which won't work for RT kernel.

Thanks for getting back and apologies for the delay. I understand that
we may break RT kernels with this patch. So, I was proposing to stick to
migrate_disable/enable() calls on RT Kernels and use preempt_disable/enable()
in case of non-RT Kernel. Which warrants change in scheduler, will push patch and try get feedback from Scheduler experts.

While I'm here I would like to cross check with you xperts on ideas to reproduce the issue easily and consistently. Your inputs will immensely help to debug issue further.


[1]
https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10/include/linux/preempt.h#335





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux