On Thu, 2023-02-16 at 00:36 -0800, Dan Williams wrote: > While experimenting with CXL region removal the following corruption of > /proc/iomem appeared. > > Before: > f010000000-f04fffffff : CXL Window 0 > f010000000-f02fffffff : region4 > f010000000-f02fffffff : dax4.0 > f010000000-f02fffffff : System RAM (kmem) > > After (modprobe -r cxl_test): > f010000000-f02fffffff : **redacted binary garbage** > f010000000-f02fffffff : System RAM (kmem) > > ...and testing further the same is visible with persistent memory > assigned to kmem: > > Before: > 480000000-243fffffff : Persistent Memory > 480000000-57e1fffff : namespace3.0 > 580000000-243fffffff : dax3.0 > 580000000-243fffffff : System RAM (kmem) > > After (ndctl disable-region all): > 480000000-243fffffff : Persistent Memory > 580000000-243fffffff : ***redacted binary garbage*** > 580000000-243fffffff : System RAM (kmem) > > The corrupted data is from a use-after-free of the "dax4.0" and "dax3.0" > resources, and it also shows that the "System RAM (kmem)" resource is > not being removed. The bug does not appear after "modprobe -r kmem", it > requires the parent of "dax4.0" and "dax3.0" to be removed which > re-parents the leaked "System RAM (kmem)" instances. Those in turn > reference the freed resource as a parent. > > First up for the fix is release_mem_region_adjustable() needs to > reliably delete the resource inserted by add_memory_driver_managed(). > That is thwarted by a check for IORESOURCE_SYSRAM that predates the > dax/kmem driver, from commit: > > 65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable") > > That appears to be working around the behavior of HMM's > "MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that > check removed the "System RAM (kmem)" resource gets removed, but > corruption still occurs occasionally because the "dax" resource is not > reliably removed. > > The dax range information is freed before the device is unregistered, so > the driver can not reliably recall (another use after free) what it is > meant to release. Lastly if that use after free got lucky, the driver > was covering up the leak of "System RAM (kmem)" due to its use of > release_resource() which detaches, but does not free, child resources. > The switch to remove_resource() forces remove_memory() to be responsible > for the deletion of the resource added by add_memory_driver_managed(). > > Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances") > Cc: <stable@xxxxxxxxxxxxxxx> > Cc: Oscar Salvador <osalvador@xxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> Reviewed-by: Vishal Verma <vishal.l.verma@xxxxxxxxx> > --- > drivers/dax/bus.c | 2 +- > drivers/dax/kmem.c | 4 ++-- > kernel/resource.c | 14 -------------- > 3 files changed, 3 insertions(+), 17 deletions(-) > > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c > index 012d576004e9..67a64f4c472d 100644 > --- a/drivers/dax/bus.c > +++ b/drivers/dax/bus.c > @@ -441,8 +441,8 @@ static void unregister_dev_dax(void *dev) > dev_dbg(dev, "%s\n", __func__); > > kill_dev_dax(dev_dax); > - free_dev_dax_ranges(dev_dax); > device_del(dev); > + free_dev_dax_ranges(dev_dax); > put_device(dev); > } > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 918d01d3fbaa..7b36db6f1cbd 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -146,7 +146,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > if (rc) { > dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n", > i, range.start, range.end); > - release_resource(res); > + remove_resource(res); > kfree(res); > data->res[i] = NULL; > if (mapped) > @@ -195,7 +195,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) > > rc = remove_memory(range.start, range_len(&range)); > if (rc == 0) { > - release_resource(data->res[i]); > + remove_resource(data->res[i]); > kfree(data->res[i]); > data->res[i] = NULL; > success++; > diff --git a/kernel/resource.c b/kernel/resource.c > index ddbbacb9fb50..b1763b2fd7ef 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -1343,20 +1343,6 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size) > continue; > } > > - /* > - * All memory regions added from memory-hotplug path have the > - * flag IORESOURCE_SYSTEM_RAM. If the resource does not have > - * this flag, we know that we are dealing with a resource coming > - * from HMM/devm. HMM/devm use another mechanism to add/release > - * a resource. This goes via devm_request_mem_region and > - * devm_release_mem_region. > - * HMM/devm take care to release their resources when they want, > - * so if we are dealing with them, let us just back off here. > - */ > - if (!(res->flags & IORESOURCE_SYSRAM)) { > - break; > - } > - > if (!(res->flags & IORESOURCE_MEM)) > break; > >