On Sun, Dec 4, 2022 at 9:17 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Sat, Dec 3, 2022 at 7:05 AM Bartosz Golaszewski <brgl@xxxxxxxx> wrote: > > > > Here's a fixed PR from the GPIO subsystem for the next rc. > > No, this cannot be right. > > That last commit seems *very* dubious, and in particular all those > > if (!down_read_trylock(&gdev->sem)) > return EPOLLHUP | EPOLLERR; > > are a sign that something is very very wrong there. > > Either the lock is necessary or it isn't, and "trylock" isn't the way > to deal with it, with random failures if you cannot take the lock. > > If you are using "trylock" because the data structure might go away > from under you, you have already lost, and the code is buggy. > > And if the data structure cannot go away from under you, you should > do an unconditional lock, and then check "gdev->chip" for being NULL > once you have gotten the lock (the same way you did in open()). > No, the data can't be removed with these locks in place. It's just to avoid going into the callbacks if gpiochip_remove() is already in progress (the only reason why trylock would fail). > But a "trylock and return error if it failed" just means that now you > are randomly returning errors to user space, which is entirely > undebuggable and makes no sense. > Technically these are the same errors we'll return later if gdev->chip is NULL but I get your point. > Or, alternatively, the trylock succeeds - because it hits fully > *after* gpiochip_remove() has finished, and now ->chip is NULL anyway, > which is what you claim to protect against. > > End result: "trylock" can never be right in this kind of context. > > That "call_locked() helper might make sense more along the lines of > > ret = -ENODEV; > > down_read(&gdev->sem)) > // Does the device still exist? > if (gdev->chip) > ret = func(file, cmd, arg); > up_read(&gdev->sem); > > return ret; > This is a good suggestion, thanks. And with it, the two patches can get squashed into one for easier backporting. > or similar. Not with that odd "try to lock, and if that fails, assume error". > > And again - if the trylock is there because 'gdev' itself might go > away at any time and you can't afford to wait on the lock, then it's > broken regardless (and the above suggestion won't help either) > > Anyway: the end result of this all is that I think this is a > fundamental bug in the gpio layer, and rc7 (soon to be rc8) is too > late to try these kinds of unfinished games. > > Fix it properly for 6.2, and make it back-portable, because I'm not > pulling something like this right now. > > Linus Will do. I will still resend the PR with only the resource leak fixes. Bartosz