Re: Fixing property memory leaks on device tree overlay removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 26 Jun 2024 09:24:46 +0200
Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx> wrote:

> On 25/06/2024 19:02, Rob Herring wrote:
> > On Mon, Jun 24, 2024 at 3:21 PM Luca Ceresoli <luca.ceresoli@xxxxxxxxxxx> wrote:  
...
> >> ===================
> >> Problem description
> >> ===================
> >>
> >> In the kernel every 'struct device_node' is refcounted so the OF core
> >> knows when to free it. There are of course get/put imbalance bugs
> >> around, but these are "just" bugs that need to be fixed as they are
> >> found.
> >>
> >> On the other hand, there is no refcounting for 'struct property'. Yet
> >> some of the internal kernel APIs to access properties, e.g.
> >> of_property_read_string(), return either a 'struct property' pointer or
> >> a copy of the 'char *value' field. This is not a bug, it is an API
> >> design flaw: any user (e.g. any OF driver) can take a pointer to
> >> property data that was allocated and should be deallocated by the OF
> >> core, but the OF core has no idea of when that pointer will stop being
> >> used.
> >>
> >> Now, when loading a DT overlay there are three possible cases:
> >>
> >>  1. both the property and the containing node are in the base tree
> >>  2. both the property and the containing node are in the same overlay
> >>  3. the property is in an overlay and the containing node is either
> >>     in the base tree or in a previously-loaded overlay
> >>
> >> Cases 1 and 2 are not problematic. In case 1 the data allocated for the
> >> properties is never removed. In case 2 the properties are removed when
> >> removing the parent node, which gets removed when removing the overlay
> >> thanks to 'struct device_node' refcounting, based on the assumption
> >> that the property lifetime is a subset of the parent node lifetime. The
> >> problem exists in case 3. Properties in case 3 are usually a small part
> >> of all the properties but there can be some (and there are some in the
> >> product we are working on), and that's what needs to be addressed.  
> > 
> > I'd like to better understand what are the cases where you need to
> > change/add properties in a node (other than "status"). I'm not
> > entirely convinced we should even allow that.  
> 
> Just to clarify that I understand the problem correctly - we talk only
> about memory leaks, not about accessing released memory (use-after-free)?

Well, the "unsafe" property accessors do return a pointer to struct
property or its values, so they would become use-after-free in case 1)
the struct property is freed (=overlays) and 2) the caller keeps the
pointer until after the property is freed.

To avoid use-after-free, all properties falling in case 3 are put into
a "deadprops" list within the struct device_node and will be released
only when the node is released, which is never for nodes in the base
tree. This trades a use-after-free for a memory leak.

> I think that during EOSS 2024 discussions we reached consensus that in
> general you will not have use-after-free problem with DT properties at
> all. If all devices are unbound, their resources get released (including
> some core structures registered in subsystems) thus nothing will use any
> of properties. With proper kernel code there will be no use of device
> node properties after device is unbound.

I agree this is the normal situation, but I'm not sure there is
consensus about that. My reply to Rob in this thread aims at clarifying
exactly what problem we need to solve. Totally eradicating the "unsafe"
property accessors would eliminate all possible use-after-free or leak
of property data. However I agree it would be a large effort to fix a
small number of issues, which can be avoided by trusting drivers a bit
more.

> >> Preventing new usages of old accessors will be important. Tools to
> >> achieve that:
> >>
> >>  * Extend checkpatch to report an error on their usage
> >>  * Add a 'K:' entry to MAINTAINERS so that patches trying to use them
> >>    will be reported (to me at least)  
> 
> Or just use lore/lei with proper keywords. I track few misuses of kernel
> code that way.

Didn't know about lore+lei, interesting. Thanks, it really looks like a
tool option for this task.

Luca

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux