On 2020-09-01 21:42, Pan, Xinhui wrote: > If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev. Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective. I know dev is embedded in adev--I did that patchset. > > 809 static void drm_dev_release(struct kref *ref) > 810 { > 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); > 812 > 813 if (dev->driver->release) > 814 dev->driver->release(dev); > 815 > 816 drm_managed_release(dev); > 817 > 818 kfree(dev->managed.final_kfree); > 819 } That's simple--this comes from change c6603c740e0e3 and it should be reverted. Simple as that. The version before this change was absolutely correct: static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); } Meaning, "the kref is now 0"--> if the driver has a release, call it, else use our own. But note that nothing can be assumed after this point, about the existence of "dev". It is exactly because struct drm_device is statically embedded into a container, struct amdgpu_device, that this change above should be reverted. This is very similar to how fops has open/release but no close. That is, the "release" is called only when the last kref is released, i.e. when kref goes from non-zero to zero. This uses the kref infrastructure which has been around for about 20 years in the Linux kernel. I suggest reading the comments in drm_dev.c mostly, "DOC: driver instance overview" starting at line 240 onwards. This is right above drm_put_dev(). There is actually an example of a driver in the comment. Also the comment to drm_dev_init(). Now, take a look at this: /** * drm_dev_put - Drop reference of a DRM device * @dev: device to drop reference of or NULL * * This decreases the ref-count of @dev by one. The device is destroyed if the * ref-count drops to zero. */ void drm_dev_put(struct drm_device *dev) { if (dev) kref_put(&dev->ref, drm_dev_release); } EXPORT_SYMBOL(drm_dev_put); Two things: 1. It is us, who kzalloc the amdgpu device, which contains the drm_device (you'll see this discussed in the reading material I pointed to above). We do this because we're probing the PCI device whether we'll work it it or not. 2. Using the kref infrastructure, when the ref goes to 0, drm_dev_release is called. And here's the KEY: Because WE allocated the container, we should free it--after the release method is called, DRM cannot assume anything about the drm device or the container. The "release" method is final. We allocate, we free. And we free only when the ref goes to 0. DRM can, in due time, "free" itself of the DRM device and stop having knowledge of it--that's fine, but as long as the ref is not 0, the amdgpu device and thus the contained DRM device, cannot be freed. > > You have to make another change something like > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > index 13068fdf4331..2aabd2b4c63b 100644 > --- a/drivers/gpu/drm/drm_drv.c > +++ b/drivers/gpu/drm/drm_drv.c > @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref) > > drm_managed_release(dev); > > - kfree(dev->managed.final_kfree); > + if (dev->driver->final_release) > + dev->driver->final_release(dev); > } No. What's this? There is no such thing as "final" release, nor is there a "partial" release. When the kref goes to 0, the device disappears. Simple. If someone is using it, they should kref-get it, and when they're done with it, they should kref-put it. The whole point is that this is done implicitly, via the kref infrastructure. drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all as per the documentation I pointed you to above. Another point is that we can do some other stuff in the release function, notify someone, write some registers, free memory we use for that PCI device, etc. If the "managed resources" infrastructure wants to stay, it should hook itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). It shouldn't have to be so out-of-place like in patch 2/3 of this series, where the drmm_add_final_kfree() is smack-dab in the middle of our PCI discovery function, surrounded on top and bottom by drm_dev_init() and drm_dev_register(). The "managed resources" infra should be non-invasive and drivers shouldn't have to change to use it--it should be invisible to them. Then our kref would just work. > > And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. > Of course we can do some cleanup work in the driver's release callback. BUT no kfree. No! No final_kfree. It's a hack. Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required. Regards, Luben > > -----原始邮件----- > 发件人: "Tuikov, Luben" <Luben.Tuikov@xxxxxxx> > 日期: 2020年9月2日 星期三 09:07 > 收件人: "amd-gfx@xxxxxxxxxxxxxxxxxxxxx" <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>, "dri-devel@xxxxxxxxxxxxxxxxxxxxx" <dri-devel@xxxxxxxxxxxxxxxxxxxxx> > 抄送: "Deucher, Alexander" <Alexander.Deucher@xxxxxxx>, Daniel Vetter <daniel@xxxxxxxx>, "Pan, Xinhui" <Xinhui.Pan@xxxxxxx>, "Tuikov, Luben" <Luben.Tuikov@xxxxxxx> > 主题: [PATCH 0/3] Use implicit kref infra > > Use the implicit kref infrastructure to free the container > struct amdgpu_device, container of struct drm_device. > > First, in drm_dev_register(), do not indiscriminately warn > when a DRM driver hasn't opted for managed.final_kfree, > but instead check if the driver has provided its own > "release" function callback in the DRM driver structure. > If that is the case, no warning. > > Remove drmm_add_final_kfree(). We take care of that, in the > kref "release" callback when all refs are down to 0, via > drm_dev_put(), i.e. the free is implicit. > > Remove superfluous NULL check, since the DRM device to be > suspended always exists, so long as the underlying PCI and > DRM devices exist. > > Luben Tuikov (3): > drm: No warn for drivers who provide release > drm/amdgpu: Remove drmm final free > drm/amdgpu: Remove superfluous NULL check > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- > drivers/gpu/drm/drm_drv.c | 3 ++- > 3 files changed, 2 insertions(+), 6 deletions(-) > > -- > 2.28.0.394.ge197136389 > > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel