On Mon, Oct 04, 2021 at 01:08:38AM -0500, Tyler Hicks wrote: > From: Rob Herring <robh@xxxxxxxxxx> > > commit 9885440b16b8fc1dd7275800fd28f56a92f60896 upstream. > > The PCI code has several paths where the struct pci_host_bridge is freed > directly. This is wrong because it contains a struct device which is > refcounted and should be freed using put_device(). This can result in > use-after-free errors. I think this problem has existed since 2012 with > commit 7b5436635800 ("PCI: add generic device into pci_host_bridge > struct"). It generally hasn't mattered as most host bridge drivers are > still built-in and can't unbind. > > The problem is a struct device should never be freed directly once > device_initialize() is called and a ref is held, but that doesn't happen > until pci_register_host_bridge(). There's then a window between allocating > the host bridge and pci_register_host_bridge() where kfree should be used. > This is fragile and requires callers to do the right thing. To fix this, we > need to split device_register() into device_initialize() and device_add() > calls, so that the host bridge struct is always freed by using a > put_device(). > > devm_pci_alloc_host_bridge() is using devm_kzalloc() to allocate struct > pci_host_bridge which will be freed directly. Instead, we can use a custom > devres action to call put_device(). > > Link: https://lore.kernel.org/r/20200513223859.11295-2-robh@xxxxxxxxxx > Reported-by: Anders Roxell <anders.roxell@xxxxxxxxxx> > Tested-by: Anders Roxell <anders.roxell@xxxxxxxxxx> > Signed-off-by: Rob Herring <robh@xxxxxxxxxx> > Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> > Acked-by: Arnd Bergmann <arnd@xxxxxxxx> > [tyhicks: Minor contextual change in pci_init_host_bridge() due to the > lack of a native_dpc member in the pci_host_bridge struct. It was added > in v5.7 with commit ac1c8e35a326 ("PCI/DPC: Add Error Disconnect > Recover (EDR) support")] > Signed-off-by: Tyler Hicks <tyhicks@xxxxxxxxxxxxxxxxxxx> > --- > > This commit has been identified as a fix for random memory corruption > that we're experiencing in production. The memory corruption is easily > reproducible on 5.4.150 and we get a nice KASAN splat that led us to > discovering the upstream fix that wasn't marked for stable inclusion. I > don't see any obvious reasons why this wouldn't be a valid linux-5.4.y > candidate and hope we can get it applied there. > > I've verified that the KASAN splat goes away and I don't see any other > evidence of the memory corruption issue once this commit is applied to > 5.4.150. Now queued up,t hanks. greg k-h