> 2020年9月2日 11:46,Tuikov, Luben <Luben.Tuikov@xxxxxxx> 写道: > > On 2020-09-01 21:42, Pan, Xinhui wrote: >> If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev. > > Do you mean "look at the function below", using "below" as an adverb? > "below" is not an adjective. > > I know dev is embedded in adev--I did that patchset. > >> >> 809 static void drm_dev_release(struct kref *ref) >> 810 { >> 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); >> 812 >> 813 if (dev->driver->release) >> 814 dev->driver->release(dev); >> 815 >> 816 drm_managed_release(dev); >> 817 >> 818 kfree(dev->managed.final_kfree); >> 819 } > > That's simple--this comes from change c6603c740e0e3 > and it should be reverted. Simple as that. > > The version before this change was absolutely correct: > > static void drm_dev_release(struct kref *ref) > { > if (dev->driver->release) > dev->driver->release(dev); > else > drm_dev_fini(dev); > } > > Meaning, "the kref is now 0"--> if the driver > has a release, call it, else use our own. > But note that nothing can be assumed after this point, > about the existence of "dev". > > It is exactly because struct drm_device is statically > embedded into a container, struct amdgpu_device, > that this change above should be reverted. > > This is very similar to how fops has open/release > but no close. That is, the "release" is called > only when the last kref is released, i.e. when > kref goes from non-zero to zero. > > This uses the kref infrastructure which has been > around for about 20 years in the Linux kernel. > > I suggest reading the comments > in drm_dev.c mostly, "DOC: driver instance overview" > starting at line 240 onwards. This is right above > drm_put_dev(). There is actually an example of a driver > in the comment. Also the comment to drm_dev_init(). > > Now, take a look at this: > > /** > * drm_dev_put - Drop reference of a DRM device > * @dev: device to drop reference of or NULL > * > * This decreases the ref-count of @dev by one. The device is destroyed if the > * ref-count drops to zero. > */ > void drm_dev_put(struct drm_device *dev) > { > if (dev) > kref_put(&dev->ref, drm_dev_release); > } > EXPORT_SYMBOL(drm_dev_put); > > Two things: > > 1. It is us, who kzalloc the amdgpu device, which contains > the drm_device (you'll see this discussed in the reading > material I pointed to above). We do this because we're > probing the PCI device whether we'll work it it or not. > that is true. My understanding of the drm core code is like something below. struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end. But yes, practice is more complex. if B has nothing to be destroyed. we can destory A directly, otherwise destroy B firstly. in this case, we can do something below in our release() //some cleanup work of B drm_dev_fini(dev);//destroy A kfree(adev) > 2. Using the kref infrastructure, when the ref goes to 0, > drm_dev_release is called. And here's the KEY: > Because WE allocated the container, we should free it--after the release > method is called, DRM cannot assume anything about the drm > device or the container. The "release" method is final. > > We allocate, we free. And we free only when the ref goes to 0. > > DRM can, in due time, "free" itself of the DRM device and stop > having knowledge of it--that's fine, but as long as the ref > is not 0, the amdgpu device and thus the contained DRM device, > cannot be freed. > >> >> You have to make another change something like >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c >> index 13068fdf4331..2aabd2b4c63b 100644 >> --- a/drivers/gpu/drm/drm_drv.c >> +++ b/drivers/gpu/drm/drm_drv.c >> @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref) >> >> drm_managed_release(dev); >> >> - kfree(dev->managed.final_kfree); >> + if (dev->driver->final_release) >> + dev->driver->final_release(dev); >> } > > No. What's this? > There is no such thing as "final" release, nor is there a "partial" release. > When the kref goes to 0, the device disappears. Simple. > If someone is using it, they should kref-get it, and when they're > done with it, they should kref-put it. I just take an example here. add another release in the end. then no one could touch us. IOW, final_release. A destroy B by a callback, then A destroy itself. It assumes B just free its own resource. but that makes trouble if some resource of A is allocated by B. Because B must take care of these common resource shared between A and B. yes, that logical is more complex. So I think we can revert drm_dev_release to its previous version. > > The whole point is that this is done implicitly, via the kref infrastructure. > drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all > as per the documentation I pointed you to above. I am not taking about kref. what we are discussing is all about the release sequence. > > Another point is that we can do some other stuff in the release > function, notify someone, write some registers, free memory we use > for that PCI device, etc. > > If the "managed resources" infrastructure wants to stay, it should hook > itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). > It shouldn't have to be so out-of-place like in patch 2/3 of this series, > where the drmm_add_final_kfree() is smack-dab in the middle of our PCI > discovery function, surrounded on top and bottom by drm_dev_init() > and drm_dev_register(). The "managed resources" infra should be non-invasive > and drivers shouldn't have to change to use it--it should be invisible to them. > Then our kref would just work. > yep, that make sense. But you need more changes to fix this issue. this patchset is insufficient. >> >> And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. >> Of course we can do some cleanup work in the driver's release callback. BUT no kfree. > > No! No final_kfree. It's a hack. > > Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required. > > Regards, > Luben > > >> >> -----原始邮件----- >> 发件人: "Tuikov, Luben" <Luben.Tuikov@xxxxxxx> >> 日期: 2020年9月2日 星期三 09:07 >> 收件人: "amd-gfx@xxxxxxxxxxxxxxxxxxxxx" <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>, "dri-devel@xxxxxxxxxxxxxxxxxxxxx" <dri-devel@xxxxxxxxxxxxxxxxxxxxx> >> 抄送: "Deucher, Alexander" <Alexander.Deucher@xxxxxxx>, Daniel Vetter <daniel@xxxxxxxx>, "Pan, Xinhui" <Xinhui.Pan@xxxxxxx>, "Tuikov, Luben" <Luben.Tuikov@xxxxxxx> >> 主题: [PATCH 0/3] Use implicit kref infra >> >> Use the implicit kref infrastructure to free the container >> struct amdgpu_device, container of struct drm_device. >> >> First, in drm_dev_register(), do not indiscriminately warn >> when a DRM driver hasn't opted for managed.final_kfree, >> but instead check if the driver has provided its own >> "release" function callback in the DRM driver structure. >> If that is the case, no warning. >> >> Remove drmm_add_final_kfree(). We take care of that, in the >> kref "release" callback when all refs are down to 0, via >> drm_dev_put(), i.e. the free is implicit. >> >> Remove superfluous NULL check, since the DRM device to be >> suspended always exists, so long as the underlying PCI and >> DRM devices exist. >> >> Luben Tuikov (3): >> drm: No warn for drivers who provide release >> drm/amdgpu: Remove drmm final free >> drm/amdgpu: Remove superfluous NULL check >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- >> drivers/gpu/drm/drm_drv.c | 3 ++- >> 3 files changed, 2 insertions(+), 6 deletions(-) >> >> -- >> 2.28.0.394.ge197136389 >> >> >> > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel