On Tue, Sep 01, 2020 at 11:46:18PM -0400, Luben Tuikov wrote: > On 2020-09-01 21:42, Pan, Xinhui wrote: > > If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev. > > Do you mean "look at the function below", using "below" as an adverb? > "below" is not an adjective. > > I know dev is embedded in adev--I did that patchset. > > > > > 809 static void drm_dev_release(struct kref *ref) > > 810 { > > 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); > > 812 > > 813 if (dev->driver->release) > > 814 dev->driver->release(dev); > > 815 > > 816 drm_managed_release(dev); > > 817 > > 818 kfree(dev->managed.final_kfree); > > 819 } > > That's simple--this comes from change c6603c740e0e3 > and it should be reverted. Simple as that. > > The version before this change was absolutely correct: > > static void drm_dev_release(struct kref *ref) > { > if (dev->driver->release) > dev->driver->release(dev); > else > drm_dev_fini(dev); > } > > Meaning, "the kref is now 0"--> if the driver > has a release, call it, else use our own. > But note that nothing can be assumed after this point, > about the existence of "dev". > > It is exactly because struct drm_device is statically > embedded into a container, struct amdgpu_device, > that this change above should be reverted. > > This is very similar to how fops has open/release > but no close. That is, the "release" is called > only when the last kref is released, i.e. when > kref goes from non-zero to zero. > > This uses the kref infrastructure which has been > around for about 20 years in the Linux kernel. > > I suggest reading the comments > in drm_dev.c mostly, "DOC: driver instance overview" > starting at line 240 onwards. This is right above > drm_put_dev(). There is actually an example of a driver > in the comment. Also the comment to drm_dev_init(). > > Now, take a look at this: > > /** > * drm_dev_put - Drop reference of a DRM device > * @dev: device to drop reference of or NULL > * > * This decreases the ref-count of @dev by one. The device is destroyed if the > * ref-count drops to zero. > */ > void drm_dev_put(struct drm_device *dev) > { > if (dev) > kref_put(&dev->ref, drm_dev_release); > } > EXPORT_SYMBOL(drm_dev_put); > > Two things: > > 1. It is us, who kzalloc the amdgpu device, which contains > the drm_device (you'll see this discussed in the reading > material I pointed to above). We do this because we're > probing the PCI device whether we'll work it it or not. > > 2. Using the kref infrastructure, when the ref goes to 0, > drm_dev_release is called. And here's the KEY: > Because WE allocated the container, we should free it--after the release > method is called, DRM cannot assume anything about the drm > device or the container. The "release" method is final. > > We allocate, we free. And we free only when the ref goes to 0. > > DRM can, in due time, "free" itself of the DRM device and stop > having knowledge of it--that's fine, but as long as the ref > is not 0, the amdgpu device and thus the contained DRM device, > cannot be freed. > > > > > You have to make another change something like > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > > index 13068fdf4331..2aabd2b4c63b 100644 > > --- a/drivers/gpu/drm/drm_drv.c > > +++ b/drivers/gpu/drm/drm_drv.c > > @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref) > > > > drm_managed_release(dev); > > > > - kfree(dev->managed.final_kfree); > > + if (dev->driver->final_release) > > + dev->driver->final_release(dev); > > } > > No. What's this? > There is no such thing as "final" release, nor is there a "partial" release. > When the kref goes to 0, the device disappears. Simple. > If someone is using it, they should kref-get it, and when they're > done with it, they should kref-put it. > > The whole point is that this is done implicitly, via the kref infrastructure. > drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all > as per the documentation I pointed you to above. > > Another point is that we can do some other stuff in the release > function, notify someone, write some registers, free memory we use > for that PCI device, etc. > > If the "managed resources" infrastructure wants to stay, it should hook > itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). > It shouldn't have to be so out-of-place like in patch 2/3 of this series, > where the drmm_add_final_kfree() is smack-dab in the middle of our PCI > discovery function, surrounded on top and bottom by drm_dev_init() > and drm_dev_register(). The "managed resources" infra should be non-invasive > and drivers shouldn't have to change to use it--it should be invisible to them. > Then our kref would just work. Unfortunately some part of that drm managed series is stuck still waiting for review (I guess I need to resubmit), but with that the docs would tell you to use devm_drm_dev_alloc instead of hand-rolling all this. Definitely not any of the ideas proposed in this discussion or patches :-) I'll cc you on that series when I send it out again. Cheers, Daniel > > > > > And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. > > Of course we can do some cleanup work in the driver's release callback. BUT no kfree. > > No! No final_kfree. It's a hack. > > Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required. > > Regards, > Luben > > > > > > -----原始邮件----- > > 发件人: "Tuikov, Luben" <Luben.Tuikov@xxxxxxx> > > 日期: 2020年9月2日 星期三 09:07 > > 收件人: "amd-gfx@xxxxxxxxxxxxxxxxxxxxx" <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>, "dri-devel@xxxxxxxxxxxxxxxxxxxxx" <dri-devel@xxxxxxxxxxxxxxxxxxxxx> > > 抄送: "Deucher, Alexander" <Alexander.Deucher@xxxxxxx>, Daniel Vetter <daniel@xxxxxxxx>, "Pan, Xinhui" <Xinhui.Pan@xxxxxxx>, "Tuikov, Luben" <Luben.Tuikov@xxxxxxx> > > 主题: [PATCH 0/3] Use implicit kref infra > > > > Use the implicit kref infrastructure to free the container > > struct amdgpu_device, container of struct drm_device. > > > > First, in drm_dev_register(), do not indiscriminately warn > > when a DRM driver hasn't opted for managed.final_kfree, > > but instead check if the driver has provided its own > > "release" function callback in the DRM driver structure. > > If that is the case, no warning. > > > > Remove drmm_add_final_kfree(). We take care of that, in the > > kref "release" callback when all refs are down to 0, via > > drm_dev_put(), i.e. the free is implicit. > > > > Remove superfluous NULL check, since the DRM device to be > > suspended always exists, so long as the underlying PCI and > > DRM devices exist. > > > > Luben Tuikov (3): > > drm: No warn for drivers who provide release > > drm/amdgpu: Remove drmm final free > > drm/amdgpu: Remove superfluous NULL check > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- > > drivers/gpu/drm/drm_drv.c | 3 ++- > > 3 files changed, 2 insertions(+), 6 deletions(-) > > > > -- > > 2.28.0.394.ge197136389 > > > > > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx