Hi Oleksij, On Fri, Feb 07, 2025 at 09:24:10AM +0100, Oleksij Rempel wrote: > On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote: > > There are many drivers use of_find_node_by_name() with a not-NULL > > device_node pointer, and a number of callers would require a call to > > of_node_get() before using it. There are also some drivers who forget > > to call of_node_get() which would cause a ref count leak[1]. So, Add a > > wraper function for of_find_node_by_name(), drivers may use this function > > to call of_find_node_by_name() with the refcount already balanced. > > > > [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@xxxxxxxxxx/ > > Hi Zhang Zekun, > > thank you for working on this issue! > > First of all, let's take a step back and analyze the initial problem. > Everything following is only my opinion... > > The main issue I see is that the current API - of_find_node_by_name - > modifies the refcount of its input by calling of_node_put(from) as part > of its search. Typically, a "find" function is expected to treat its > input as read-only. That is, when you pass an object into such a > function, you expect its reference count to remain unchanged unless > ownership is explicitly transferred. In this case, lowering the refcount > on the input node is counterintuitive and already lead to unexpected > behavior and subtle bugs. > > To address this, the workaround introduces a wrapper function, > of_find_node_by_name_balanced, which first increments the input’s > refcount (via of_node_get()) before calling the original function. While > this "balances" the refcount change, the naming remains problematic from > my perspective. The "_balanced" suffix isn’t part of our common naming > conventions (traditions? :)). Most drivers expect that a function > starting with "find" will not alter the reference count of its input. > The term "balanced" doesn’t clearly convey that the input's refcount is > being explicitly managed - it instead obscures the underlying behavior, > leaving many developers confused about what guarantees the API provides. > > In my view, a more natural solution would be to redesign the API so that > it doesn’t modify the input object’s refcount at all. Instead, it should > solely increase the refcount of the returned node (if found) for safe > asynchronous usage. This approach would align with established > conventions where "find" implies no side effects on inputs or output, > and a "get" indicates that the output comes with an extra reference. For > example, a function named of_get_node_by_name would clearly signal that > only the returned node is subject to a refcount increase while leaving > the input intact. > > Thus, while the current workaround "balances" the reference count, it > doesn't address the underlying design flaw. The naming still suggests a > "find" function that should leave the input untouched, which isn’t the > case here. A redesign of the API - with both the behavior and naming > aligned to common expectations - would be a clearer and more robust > solution. > > Nevertheless, it is only my POV, and the final decision rests with the > OpenFirmware framework maintainers. I agree overall that the naming is not optimal. Looking at the other patches in the series, I think at least some of them misuse of_find_node_by_name(). For instance, drivers/media/i2c/max9286.c calls the function to find a *child* node of the device's of_node named "i2c-mux", while of_find_node_by_name() looks at children first but will then walk the *whole* DT to find a named node. I haven't checked all patches, but other ones seem to suffer from the same misuse. Assuming that the named node those drivers are looking for is a direct child of the node passed as argument to of_find_node_by_name(), the right fix would tbe to use of_get_child_by_name(). If it's not a direct child, a recursive version of of_get_child_by_name() could be useful. -- Regards, Laurent Pinchart