Hi Nathan, Nick, (just picking some LLVM compiler people I know of... and trust) Context is this patch: https://lore.kernel.org/linux-gpio/20231115102824.23727-1-quic_aiquny@xxxxxxxxxxx/ On Thu, Nov 30, 2023 at 6:37 AM Aiqun(Maria) Yu <quic_aiquny@xxxxxxxxxxx> wrote: > On 11/29/2023 11:08 PM, Linus Walleij wrote: > > On Wed, Nov 29, 2023 at 3:56 PM Linus Torvalds > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > >> On Wed, 29 Nov 2023 at 04:09, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote: > >>> > >>> The most interesting patch is the list iterator fix in the core by Maria > >>> Yu, it took a while for me to realize what was going on there. > >> > >> That commit message still doesn't explain what the problem was. > >> > >> Why is p->state volatile there? It seems to be a serious locking bug > >> if p->state can randomly change there, and the READ_ONCE() looks like > >> a "this hides the problem" rather than an actual real fix. > > This is indeed an interesting issue. Thx for the comment, Linus. > **Let me explain how: "p->state becomes volatile in the list iterator", > and "why READ_ONCE is suggested". > > The current critical code is: > list_for_each_entry(setting, &p->state->settings, node) > > after elaborating the define list_for_each_entry, so above critical code > will be: > for (setting = list_head(&p->state->settings, typeof(*setting), node); \ > &setting->node != (&p->state->settings); \ > setting = list_next(setting , node)) > > The asm code(refer result from Clang version 10.0) can cleared explain > the step of p->state reload actions: > loop: > ldr x22,[x22] ; x22=list_next(setting , node)) > add x9,x8,#0x18 ; x9=&p->state->setting > cmp x22,x9 ; setting,x9 > b.eq 0xFFFFFF9B24483530 > > ldr w9,[x22,#0x10] ; w9,[setting,#16] > cmp w9,#0x2 ; w9,#2 > b.ne 0xFFFFFF9B24483504 > > mov x0,x22 ; x0,setting > bl 0xFFFFFF9B24486048 ; pinmux_disable_setting > > ldr x8,[x19,#0x28] ; x19=p, x8=[p->state], *reload p->state* > b loop > > The *reload p->state* inside the loop for checking is not needed and can > cause possible infinite loop. So READ_ONCE is highly suggested even if > p->state is not randomly changed. And then unnecessary "ldr > x8,[x19,#0x28]" can be removed from above loop code. > > **Comments about the locking bug: > currently pinctrl_select_state is an export symbol and don't have > effective reentrance protect design. That's why current infinite loop > issue was observed with customer's multi thread call with > pinctrl_select_state without lock protection. pinctrl_select_state > totally rely on driver module user side to ensure the reentrant state. > > Usually the suggested usage from driver side who are using pinctrl would be: > LOCK; > pinctrl_select_state(); > gpio pulling; > udelay(); > check state; > other hardware behaviors; > UNLOCK; > > So the locking bug fix I have told customer side to fix from their own > driver part. Since usually not only a simple pinctrl_select_state call > can finish the hardware state transaction. > > I myself also is fine to have a small per pinctrl lock to only protect > the current pinctrl_select_state->pinctrl_commit_state reentrance > issues. Pls any pinctrl maintainer help to comment to suggest or not and > I can prepare a change as well. Luckily I am the pin control maintainer :D And I also ha my morning coffee and looked over the patch again. So tilting the compiler to generate code that is less prone to race conditions with READ_ONCE() isn't really the solution is it? We need to introduce a proper lock that stops this from happening if it is a problem people are facing. Can you try to make a patch that removes READ_ONCE() and introduce a lock instead? Racing is rarely an issue in pin control for reasons explained in another context here: https://lore.kernel.org/linux-gpio/CACRpkdZ0cnJpYuzU=47-oW-7N_YGMo2vXpKOeXeNi5PhPY7QMA@xxxxxxxxxxxxxx/ ...but if people still manage to run into it, we better have a lock there. Just make sure it is not just an issue with outoftree code, but a real problem? The change that changes the code to use the old_state variable should stay in, it makes the code more readable, it's just the READ_ONCE() macro which is dubious. Yours, Linus Walleij