On 09/22/2020 12:30 PM, Sinan Kaya wrote:
On 9/21/2020 10:11 PM, Huacai Chen wrote:
his sounds like a quirk to me rather than a behavior that should be
applied to all platforms.
Yes, this is very like a quirk, but it seems there are a lot of
platforms that have problems, and removing the pci_disable_device()
has no side effect.
Why is there no side effect?
AFAIK, kexec goes through the shutdown path and you are leaving a PCI
device enabled during kexec boot which can corrupt the booting OS
memory.
Hi,
The related kexec operations are already executed afterwards by the function
pci_device_shutdown(), this is done by commit 4fc9bbf98fd6 ("PCI: Disable
Bus Master only on kexec reboot") and commit 6e0eda3c3898 ("PCI: Don't try
to disable Bus Master on disconnected PCI devices").
drivers/pci/pci-driver.c
static void pci_device_shutdown(struct device *dev)
{
...
if (drv && drv->shutdown)
drv->shutdown(pci_dev);
/*
* If this is a kexec reboot, turn off Bus Master bit on the
* device to tell it to not continue to do DMA. Don't touch
* devices in D3cold or unknown states.
* If it is not a kexec reboot, firmware will hit the PCI
* devices with big hammer and stop their DMA any way.
*/
if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
pci_clear_master(pci_dev);
}
device_shutdown()
dev->bus->shutdown() == pci_device_shutdown()
drv->shutdown() == pcie_portdrv_shutdown()
pci_disable_device()
[ 36.159446] Call Trace:
[ 36.241688] [<ffffffff80211434>] show_stack+0x9c/0x130
[ 36.326619] [<ffffffff80661b70>] dump_stack+0xb0/0xf0
[ 36.410403] [<ffffffff806a8240>] pcie_portdrv_shutdown+0x18/0x78
[ 36.495302] [<ffffffff8069c6b4>] pci_device_shutdown+0x44/0x90
[ 36.580027] [<ffffffff807aac90>] device_shutdown+0x130/0x290
[ 36.664486] [<ffffffff80265448>] kernel_power_off+0x38/0x80
[ 36.748272] [<ffffffff80265634>] __do_sys_reboot+0x1a4/0x258
[ 36.831985] [<ffffffff80218b90>] syscall_common+0x34/0x58
Early discussions:
https://lore.kernel.org/patchwork/patch/1304917/#1499666
https://lore.kernel.org/patchwork/patch/1305067/
Thanks,
Tiezhu
I don't think you can generalize a behavior based on a few quirky
devices. You should be quirking only the device that has a problem
rather than changing the behavior of all other platforms.