On Wed, May 5, 2021 at 12:28 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > On 5/5/21 11:17 AM, Andy Shevchenko wrote: > > On Wed, May 5, 2021 at 12:07 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > >> On 5/4/21 9:52 AM, Andy Shevchenko wrote: > >>> On Monday, May 3, 2021, Hans de Goede <hdegoede@xxxxxxxxxx <mailto:hdegoede@xxxxxxxxxx>> wrote: > > > > ... > > > >>> + fwnode = device_get_next_child_node(kdev, fwnode); > > > >>> Who is dropping reference counting on fwnode ? > >> > >> We are dealing with ACPI fwnode-s here and those are not ref-counted, they > >> are embedded inside a struct acpi_device and their lifetime is tied to > >> that struct. They should probably still be ref-counted (with the count > >> never dropping to 0) so that the generic fwnode functions behave the same > >> anywhere but atm the ACPI nodes are not refcounted, see: acpi_get_next_subnode() > >> in drivers/acpi/property.c which is the get_next_child_node() implementation > >> for ACPI fwnode-s. > > > > Yes, ACPI currently is exceptional, but fwnode API is not. > > If you may guarantee that this case won't ever be outside of ACPI > > Yes I can guarantee that currently this code (which is for the i915 > driver only) only deals with ACPI fwnodes. > > > and > > even though if ACPI won't ever gain a reference counting for fwnodes, > > we can leave it as is. > > Would it not be better to add fake ref-counting to the ACPI fwnode > next_child_node() op though. I believe just getting a reference > on the return value there should work fine; and then all fwnode > implementations would be consistent ? But it's already there by absent put/get callbacks. On fwnode level it is like you described. So, talking for a good pattern we have to call the fwnode_handle_put() independently and always for for_each_child and get_next_child usages. > (note I did not check that the of and swnode code do return > a reference but I would assume so). Yes, it's only ACPI that survives w/o reference counting. > >>> I’m in the middle of a pile of fixes for fwnode refcounting when for_each_child or get_next_child is used. So, please double check you drop a reference. > >> > >> The kdoc comments on device_get_next_child_node() / fwnode_get_next_child_node() > >> do not mention anything about these functions returning a reference. > > > > It's possible. I dunno if it had to be done earlier. Sakari? > > > >> So I think we need to first make up our mind here how we want this all to > >> work and then fix the actual implementation and docs before fixing callers. > > > > We have already issues, so I prefer not to wait for a documentation > > update, because for old kernels it will still be an issue. > > I wonder if we really have issues though, in practice fwnodes are > generated from an devicetree or ACPI tables (or by platform codes > adding swnodes) and then these pretty much stick around for ever. Overlays. Not for ever. > IOW the initial refcount of 1 is never dropped at least for of-nodes > and ACPI nodes. > I know there are some exceptions like device-tree > overlays which I guess may also be dynamically removed again, but those > exceptions are not widely used. ACPI overlays are quite used (at least by two people I know and a few more that asked questions about them here and there), but luckily it doesn't require refcounting (yet?). > And if we forget to drop a reference in the worst case we have a small > non-re-occuring (so not growing) memleak. And is it good to provoke all kinds of tools (kmemleak, *SANs, etc)? I do not think so. If we are writing good code it should be good enough. > Where as if we start adding > put() calls everywhere we may end up freeing things which are still > in use; or dropping refcounts below 0 triggering WARNs in various > places (IIRC). Which is good. Then we will discover real issues. > So it seems the cure is potentially worse then the disease in this > case. I tend to disagree with you. How in this case we can go below 0 in case we know that we took a counter? If somewhere else the code will do that, it is a problem that has to be fixed on case-by-case basis. > So if you want to work on this, then IMHO it would be best to first make > sure that all the fwnode implementations behave in the same way wrt > ref-counting, before adding the missing put() calls in various > places. > > And once the behavior is consistent It's consistent now independently of the beneath layer from fwnode API p.o.v. > then we can also document this > properly making it easier for other people to do the right thing > when using these functions. -- With Best Regards, Andy Shevchenko