On Wed, 19 Feb 2020 at 16:22, Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > > On Wed, Feb 19, 2020 at 5:09 PM Emil Velikov <emil.l.velikov@xxxxxxxxx> wrote: > > > > On Wed, 19 Feb 2020 at 14:23, Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > > > > > > On Wed, Feb 19, 2020 at 2:33 PM Greg Kroah-Hartman > > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > On Wed, Feb 19, 2020 at 03:28:47PM +0200, Laurent Pinchart wrote: > > > > > Hi Daniel, > > > > > > > > > > Thank you for the patch. > > > > > > > > > > On Wed, Feb 19, 2020 at 11:20:33AM +0100, Daniel Vetter wrote: > > > > > > We have lots of these. And the cleanup code tends to be of dubious > > > > > > quality. The biggest wrong pattern is that developers use devm_, which > > > > > > ties the release action to the underlying struct device, whereas > > > > > > all the userspace visible stuff attached to a drm_device can long > > > > > > outlive that one (e.g. after a hotunplug while userspace has open > > > > > > files and mmap'ed buffers). Give people what they want, but with more > > > > > > correctness. > > > > > > > > > > > > Mostly copied from devres.c, with types adjusted to fit drm_device and > > > > > > a few simplifications - I didn't (yet) copy over everything. Since > > > > > > the types don't match code sharing looked like a hopeless endeavour. > > > > > > > > > > > > For now it's only super simplified, no groups, you can't remove > > > > > > actions (but kfree exists, we'll need that soon). Plus all specific to > > > > > > drm_device ofc, including the logging. Which I didn't bother to make > > > > > > compile-time optional, since none of the other drm logging is compile > > > > > > time optional either. > > > > > > > > > > > > One tricky bit here is the chicken&egg between allocating your > > > > > > drm_device structure and initiliazing it with drm_dev_init. For > > > > > > perfect onion unwinding we'd need to have the action to kfree the > > > > > > allocation registered before drm_dev_init registers any of its own > > > > > > release handlers. But drm_dev_init doesn't know where exactly the > > > > > > drm_device is emebedded into the overall structure, and by the time it > > > > > > returns it'll all be too late. And forcing drivers to be able clean up > > > > > > everything except the one kzalloc is silly. > > > > > > > > > > > > Work around this by having a very special final_kfree pointer. This > > > > > > also avoids troubles with the list head possibly disappearing from > > > > > > underneath us when we release all resources attached to the > > > > > > drm_device. > > > > > > > > > > This is all a very good idea ! Many subsystems are plagged by drivers > > > > > using devm_k*alloc to allocate data accessible by userspace. Since the > > > > > introduction of devm_*, we've likely reduced the number of memory leaks, > > > > > but I'm pretty sure we've increased the risk of crashes as I've seen > > > > > some drivers that used .release() callbacks correctly being naively > > > > > converted to incorrect devm_* usage :-( > > > > > > > > > > This leads me to a question: if other subsystems have the same problem, > > > > > could we turn this implementation into something more generic ? It > > > > > doesn't have to be done right away and shouldn't block merging this > > > > > series, but I think it would be very useful. > > > > > > > > It shouldn't be that hard to tie this into a drv_m() type of a thing > > > > (driver_memory?) > > > > > > > > And yes, I think it's much better than devm_* for the obvious reasons of > > > > this being needed here. > > > > > > There's two reasons I went with copypasta instead of trying to share code: > > > - Type checking, I definitely don't want people to mix up devm_ with > > > drmm_. But even if we do a drv_m that subsystems could embed we do > > > have quite a few different types of component drivers (and with > > > drm_panel and drm_bridge even standardized), and I don't want people > > > to be able to pass the wrong kind of struct to e.g. a managed > > > drmm_connector_init - it really needs to be the drm_device, not a > > > panel or bridge or something else. > > > > > > - We could still share the code as a kind of implementation/backend > > > library. But it's not much, and with embedding I could use the drm > > > device logging stuff which is kinda nice. But if there's more demand > > > for this I can definitely see the point in sharing this, as Laurent > > > pointed out with the tiny optimization with not allocating a NULL void > > > * that I've done (and screwed up) it's not entirely trivial code. > > > > > > > My 2c as they say, although closer to a brain dump :-) > > > > On one hand the drm_device has an embedded struct device. On the other > > drm_device preserves state which outlives the embedded struct device. > > > > Would it make sense to keep drm_device better related to the > > underlying device? Effectively moving the $misc state to drm_driver. > > This idea does raise another question - struct drm_driver unlike many > > other struct $foo_driver, does not embedded device_driver :-( > > So if one is to cover the above two, then the embedding concerns will > > be elevated. > > drm_driver isn't a bus device driver in the linux driver model sense, > but an uapi thing that sits on top of some underlying device. So maybe > better to rename drm_driver to drm_interface_driver, and drm_device to > drm_interface. But that would be giantic churn and probably lots of > confusion. We do require a link between drm_device->struct device > nowadays, but that's just to guarantee userspace can find the > drm_device in sysfs somewhere and make sense of what it actually > drives. > > That's also why the lifetimes for the two things are totally > different. The device driver an all it's resources are tied to the > underlying physical device, and resources can be released when that > driver<->device link is broken (either unbind or hotunplug). That's > what devm_ does. The drm_driver/drm_device otoh is tied to the > userspace api, and can only disappear once all the userspace handles > have been cleaned up and released. And we have an enormous amount of > those, with all the mmaps, and shared fd for dma-buf, sync_file, > synobj and whatever else. The drm_device can only be cleaned up once > userspace has closed all these things, or we'll go boom somewhere. The > only connection is that the userspace interface drives the underlying > hw (as long as it's still there) and the hw side holds a reference on > the uapi side (drm_dev_get/put) to make sure the userspace side > doesn't go poof and disappear when no one has the /dev node open :-) > > But aside from these links they're completely separate worlds, and > mixing up the lifetimes results in all kinds of bad things happening. > Ofc normally these two things exist at the same time, but hotunplug > makes things very interesting here. And traditionally we've handled it > badly, if at all in drm. > Seems like my drm_device/drm_driver definitions were off. Thanks a lot for clarifying. -Emil _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx