On Thu, Nov 25, 2021 at 11:56 AM Shreeya Patel <shreeya.patel@xxxxxxxxxxxxx> wrote: > On 16/11/21 1:23 am, Gabriel Krisman Bertazi wrote: > > Emil Velikov <emil.velikov@xxxxxxxxxxxxx> writes: > >> Hi Shreeya, all, > >> > >> On 2021/11/09, Shreeya Patel wrote: > >>> There is a race in registering of gc->irq.domain when > >>> probing the I2C driver. > >>> This sometimes leads to a Kernel NULL pointer dereference > >>> in gpiochip_to_irq function which uses the domain variable. > >>> > >>> To avoid this issue, set gc->to_irq after domain is > >>> initialized. This will make sure whenever gpiochip_to_irq > >>> is called, it has domain already initialized. > >>> > >> What is stopping the next developer to moving the assignment to the > >> incorrect place? Aka should we add an inline comment about this? > > I agree with Emil. The patch seems like a workaround that doesn't > > really solve the underlying issue. I'm not familiar with this code, but > > it seems that gc is missing a lock during this initialization, to prevent > > it from exposing a partially initialized gc->irq. > > I do not see any locking mechanism used for protecting the use of gc > members before they are > initialized. We faced a very similar problem with gc->to_irq as well > where we had to return EPROBE_DEFER until it was initialized and ready > to be used. > > Linus, do you have any suggestion on what would be the correct way to > fix this issue of race in registration of gc members? Not really, we just haven't faced the issue until now because it is only now that people have actually added all these devlinks and deferred probing and what not that actually starts to stress the system and now that results in it being less stable, right? How do other subsystems do it? Yours, Linus Walleij