On 6/13/22 11:09 PM, Yonghong Song wrote:
On 6/13/22 6:10 PM, Satya Durga Srinivasu Prabhala wrote:
On 6/13/22 2:49 PM, Alexei Starovoitov wrote:
On Mon, Jun 13, 2022 at 2:35 PM Satya Durga Srinivasu Prabhala
<quic_satyap@xxxxxxxxxxx> wrote:
On 6/13/22 2:01 PM, Alexei Starovoitov wrote:
is doesn't solve anything.
Please provide a reproducer.
I'm trying to find an easy way to repro the issue, so far,
unsuccessful.
iirc the task's affinity change can race even with preemption
disabled
on this cpu. Why would s/migrate/preemption/ address the deadlock ?
I don't think task's affinity change races with preemption
disabled/enabled.
Switching to preemption disable/enable calls helps as it's just simple
counter increment and decrement with a barrier, but with migrate
disable/enable when task's affinity changes, we run into recursive bug
due to rq lock.
As Yonghong already explained, replacing migrate_disable
with preempt_disable around bpf prog invocation is not an option.
If I understand correctly, Yonghong mentioned that replacing migrate_
with preempt_ won't work for RT Kernels and migrate_ APIs were
introduced
for RT Kernels is what he was pointing to. I went back and cross checked
on 5.10 LTS Kernel, I see that the migrate_ calls end up just calling
into
preemt_ calls [1]. So far non-RT kernels, sticking to preemt_ calls
should
still work. Please let me know if I missed anything.
Yes, old kernel migrate_disable/enable() implementation with
simply preempt_disable/enable() are transitional. You can check
5.12 kernel migrate_disable/enable() implementation. Note that
your patch, if accepted, will apply to the latest kernel. So we
cannot simply replace migrate_disable() with prempt_disable(),
which won't work for RT kernel.
Thanks for getting back and apologies for the delay. I understand that
we may break RT kernels with this patch. So, I was proposing to stick to
migrate_disable/enable() calls on RT Kernels and use preempt_disable/enable()
in case of non-RT Kernel. Which warrants change in scheduler, will push
patch and try get feedback from Scheduler experts.
While I'm here I would like to cross check with you xperts on ideas to
reproduce the issue easily and consistently. Your inputs will immensely help to
debug issue further.
[1]
https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10/include/linux/preempt.h#335