pt., 13 mar 2020 o 16:04 Bartosz Golaszewski <brgl@xxxxxxxx> napisał(a): > > pt., 13 mar 2020 o 09:44 Linus Walleij <linus.walleij@xxxxxxxxxx> napisał(a): > > > > On Thu, Mar 12, 2020 at 7:25 PM Bartosz Golaszewski > > <bgolaszewski@xxxxxxxxxxxx> wrote: > > > > > I believe this is not correct. The resources managed by devres are > > > released when the device is detached from a driver, not when the > > > device's reference count goes to 0. When the latter happens, the > > > device's specific (or its device_type's) release callback is called - > > > for gpiolib this is gpiodevice_release(). > > > > Yeah you're right, I even point that out in my second letter :/ > > > > It's a bit of confusion for everyone (or it's just me). > > > > No, it is confusing and I only recently understood it while trying to > fix a memory leak in nvmem. > > > > The kref inside struct device will not go down to zero until you call > > > device_del() (if you previously called device_add() that is which > > > increases the reference count by a couple points). But what I'm > > > thinking about is making the call to device_del() depend not on the > > > call to gpiochip_remove() but on the kref on the gpio device going > > > down to zero. As for the protection against module removal - this > > > should be handled by module_get/put(). > > > > Right. At the end of gpiochip_remove(): > > > > cdev_device_del(&gdev->chrdev, &gdev->dev); > > put_device(&gdev->dev); > > > > That last put_device() should in best case bring the refcount > > to zero. > > > > So the actual way we lifecycle GPIO chips is managed > > resources using only devm_* but the reference count does work > > too: reference count should normally land at zero since the > > gpiochip_remove() call is ended with a call to > > put_device() and that should (ideally) bring it to zero. > > > > It's just that this doesn't really trigger anything. > > > > Not necessarily - if the new kref for GPIO device would be detached > from device reference counting and what it would trigger at release is > this: > > cdev_device_del(&gdev->chrdev, &gdev->dev); > put_device(&gdev->dev); > > Or to be even more clear: "getting" the gpiodevice would not be the > same as "getting" a device - in fact only when the gpiodevice kref > goes down to 0 do we put the reference to the device object. > > > I think there is no way out of the fact that we have to > > forcefully remove the gpio_chip when devm_* destructors > > kicks in: the driver is indeed getting removed at that > > point. > > > > There does seem to be a way around that though: the clock framework > does it by creating a clock "core" object which is reference counted > and if the provider is removed while consumers still hold references > to it, then it does a couple things to "numb" the provider (as you > nicely put it) like replacing all ops callbacks with NULL pointers but > keeps the structure alive until the consumers also give up all their > references. > > That being said: I'm not saying this is necessary or even useful. I > started the discussion because I was under the impression I wasn't > clear enough when writing about reference counting for descriptors. If > nobody complains about the current implementation then let's not fix > something that's not broken. > > Bartosz > > > In gpiochip_remove() we "numb" the chip so that any > > gpio_desc:s currently in use will just fail silently and not crash, > > since they are not backed by a driver any more. The descs > > stay around until the consumer releases them, but if we probe the > > same GPIO device again they will certainly not re-attach or > > something. > > > > Arguably it is a bit of policy. Would it make more sense to > > have rmmod fail if the kref inside gdev->dev->kobj->kref > > is != 1? I suppose that is what things like storage > > drivers pretty much have to do. > > > > The problem with that is that as soon as you have a consumer > > that is compiled into the kernel it makes it impossible to > > remove the gpio driver with rmmod. > > > > I really needed to refresh this a bit, so the above is maybe > > a bit therapeutic. > > > > I don't really see how we could do things differently without > > creating some other problem though. > > > > Yours, > > Linus Walleij Hi Linus, what is your decision on this? Because if we don't merge this, then we need to make sure nvmem doesn't call gpiod_put() for descriptors it didn't obtain itself and we should probably fix it this week. Bart