Re: [GIT PULL] Pin control fixes for v6.7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nathan, Nick,

(just picking some LLVM compiler people I know of... and trust)

Context is this patch:
https://lore.kernel.org/linux-gpio/20231115102824.23727-1-quic_aiquny@xxxxxxxxxxx/

On Thu, Nov 30, 2023 at 6:37 AM Aiqun(Maria) Yu <quic_aiquny@xxxxxxxxxxx> wrote:
> On 11/29/2023 11:08 PM, Linus Walleij wrote:
> > On Wed, Nov 29, 2023 at 3:56 PM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >> On Wed, 29 Nov 2023 at 04:09, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote:
> >>>
> >>> The most interesting patch is the list iterator fix in the core by Maria
> >>> Yu, it took a while for me to realize what was going on there.
> >>
> >> That commit message still doesn't explain what the problem was.
> >>
> >> Why is p->state volatile there? It seems to be a serious locking bug
> >> if p->state can randomly change there, and the READ_ONCE() looks like
> >> a "this hides the problem" rather than an actual real fix.
>
> This is indeed an interesting issue. Thx for the comment, Linus.
> **Let me explain how: "p->state becomes volatile in the list iterator",
> and "why READ_ONCE is suggested".
>
> The current critical code is:
>    list_for_each_entry(setting, &p->state->settings, node)
>
> after elaborating the define list_for_each_entry, so above critical code
> will be:
>    for (setting = list_head(&p->state->settings, typeof(*setting), node); \
> &setting->node != (&p->state->settings); \
> setting = list_next(setting , node))
>
> The asm code(refer result from Clang version 10.0) can cleared explain
> the step of p->state reload actions:
> loop:
> ldr x22,[x22] ; x22=list_next(setting , node))
> add x9,x8,#0x18 ; x9=&p->state->setting
> cmp x22,x9 ; setting,x9
> b.eq 0xFFFFFF9B24483530
>
> ldr w9,[x22,#0x10] ; w9,[setting,#16]
> cmp w9,#0x2 ; w9,#2
> b.ne 0xFFFFFF9B24483504
>
> mov x0,x22 ; x0,setting
> bl 0xFFFFFF9B24486048 ; pinmux_disable_setting
>
> ldr x8,[x19,#0x28] ; x19=p, x8=[p->state], *reload p->state*
> b loop
>
> The *reload p->state* inside the loop for checking is not needed and can
> cause possible infinite loop. So READ_ONCE is highly suggested even if
> p->state is not randomly changed. And then unnecessary "ldr
> x8,[x19,#0x28]" can be removed from above loop code.
>
> **Comments about the locking bug:
> currently pinctrl_select_state is an export symbol and don't have
> effective reentrance protect design. That's why current infinite loop
> issue was observed with customer's multi thread call with
> pinctrl_select_state without lock protection. pinctrl_select_state
> totally rely on driver module user side to ensure the reentrant state.
>
> Usually the suggested usage from driver side who are using pinctrl would be:
> LOCK;
> pinctrl_select_state();
> gpio pulling;
> udelay();
> check state;
> other hardware behaviors;
> UNLOCK;
>
> So the locking bug fix I have told customer side to fix from their own
> driver part. Since usually not only a simple pinctrl_select_state call
> can finish the hardware state transaction.
>
> I myself also is fine to have a small per pinctrl lock to only protect
> the current pinctrl_select_state->pinctrl_commit_state reentrance
> issues. Pls any pinctrl maintainer help to comment to suggest or not and
> I can prepare a change as well.

Luckily I am the pin control maintainer :D
And I also ha my morning coffee and looked over the patch again.

So tilting the compiler to generate code that is less prone to race
conditions with READ_ONCE() isn't really the solution is it? We need
to introduce a proper lock that stops this from happening if it is
a problem people are facing.

Can you try to make a patch that removes READ_ONCE()
and introduce a lock instead?

Racing is rarely an issue in pin control for reasons explained
in another context here:
https://lore.kernel.org/linux-gpio/CACRpkdZ0cnJpYuzU=47-oW-7N_YGMo2vXpKOeXeNi5PhPY7QMA@xxxxxxxxxxxxxx/

...but if people still manage to run into it, we better have a lock
there. Just make sure it is not just an issue with outoftree code,
but a real problem?

The change that changes the code to use the old_state variable
should stay in, it makes the code more readable, it's just the
READ_ONCE() macro which is dubious.

Yours,
Linus Walleij





[Index of Archives]     [Linux SPI]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux