This is a note to let you know that I've just added the patch titled Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally" to the 5.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: revert-pci-hv-fix-a-timing-issue-which-causes-kdump-to-fail-occasionally.patch and it can be found in the queue-5.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From a847234e24d03d01a9566d1d9dcce018cc018d67 Mon Sep 17 00:00:00 2001 From: Dexuan Cui <decui@xxxxxxxxxxxxx> Date: Wed, 14 Jun 2023 21:44:50 -0700 Subject: Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally" From: Dexuan Cui <decui@xxxxxxxxxxxxx> commit a847234e24d03d01a9566d1d9dcce018cc018d67 upstream. This reverts commit d6af2ed29c7c1c311b96dac989dcb991e90ee195. The statement "the hv_pci_bus_exit() call releases structures of all its child devices" in commit d6af2ed29c7c is not true: in the path hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the child "struct hv_pci_dev *hpdev" that is created earlier in pci_devices_present_work() -> new_pcichild_device(). The commit d6af2ed29c7c was originally made in July 2020 for RHEL 7.7, where the old version of hv_pci_bus_exit() was used; when the commit was rebased and merged into the upstream, people didn't notice that it's not really necessary. The commit itself doesn't cause any issue, but it makes hv_pci_probe() more complicated. Revert it to facilitate some upcoming changes to hv_pci_probe(). Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> Reviewed-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> Acked-by: Wei Hu <weh@xxxxxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx Link: https://lore.kernel.org/r/20230615044451.5580-5-decui@xxxxxxxxxxxxx Signed-off-by: Wei Liu <wei.liu@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/pci/controller/pci-hyperv.c | 71 +++++++++++++++++------------------- 1 file changed, 34 insertions(+), 37 deletions(-) --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -2889,8 +2889,10 @@ static int hv_pci_enter_d0(struct hv_dev struct pci_bus_d0_entry *d0_entry; struct hv_pci_compl comp_pkt; struct pci_packet *pkt; + bool retry = true; int ret; +enter_d0_retry: /* * Tell the host that the bus is ready to use, and moved into the * powered-on state. This includes telling the host which region @@ -2917,6 +2919,38 @@ static int hv_pci_enter_d0(struct hv_dev if (ret) goto exit; + /* + * In certain case (Kdump) the pci device of interest was + * not cleanly shut down and resource is still held on host + * side, the host could return invalid device status. + * We need to explicitly request host to release the resource + * and try to enter D0 again. + */ + if (comp_pkt.completion_status < 0 && retry) { + retry = false; + + dev_err(&hdev->device, "Retrying D0 Entry\n"); + + /* + * Hv_pci_bus_exit() calls hv_send_resource_released() + * to free up resources of its child devices. + * In the kdump kernel we need to set the + * wslot_res_allocated to 255 so it scans all child + * devices to release resources allocated in the + * normal kernel before panic happened. + */ + hbus->wslot_res_allocated = 255; + + ret = hv_pci_bus_exit(hdev, true); + + if (ret == 0) { + kfree(pkt); + goto enter_d0_retry; + } + dev_err(&hdev->device, + "Retrying D0 failed with ret %d\n", ret); + } + if (comp_pkt.completion_status < 0) { dev_err(&hdev->device, "PCI Pass-through VSP failed D0 Entry with status %x\n", @@ -3162,7 +3196,6 @@ static int hv_pci_probe(struct hv_device struct hv_pcibus_device *hbus; u16 dom_req, dom; char *name; - bool enter_d0_retry = true; int ret; /* @@ -3298,47 +3331,11 @@ static int hv_pci_probe(struct hv_device if (ret) goto free_fwnode; -retry: ret = hv_pci_query_relations(hdev); if (ret) goto free_irq_domain; ret = hv_pci_enter_d0(hdev); - /* - * In certain case (Kdump) the pci device of interest was - * not cleanly shut down and resource is still held on host - * side, the host could return invalid device status. - * We need to explicitly request host to release the resource - * and try to enter D0 again. - * Since the hv_pci_bus_exit() call releases structures - * of all its child devices, we need to start the retry from - * hv_pci_query_relations() call, requesting host to send - * the synchronous child device relations message before this - * information is needed in hv_send_resources_allocated() - * call later. - */ - if (ret == -EPROTO && enter_d0_retry) { - enter_d0_retry = false; - - dev_err(&hdev->device, "Retrying D0 Entry\n"); - - /* - * Hv_pci_bus_exit() calls hv_send_resources_released() - * to free up resources of its child devices. - * In the kdump kernel we need to set the - * wslot_res_allocated to 255 so it scans all child - * devices to release resources allocated in the - * normal kernel before panic happened. - */ - hbus->wslot_res_allocated = 255; - ret = hv_pci_bus_exit(hdev, true); - - if (ret == 0) - goto retry; - - dev_err(&hdev->device, - "Retrying D0 failed with ret %d\n", ret); - } if (ret) goto free_irq_domain; Patches currently in stable-queue which might be from decui@xxxxxxxxxxxxx are queue-5.15/pci-hv-add-a-per-bus-mutex-state_lock.patch queue-5.15/pci-hv-fix-a-race-condition-in-hv_irq_unmask-that-can-cause-panic.patch queue-5.15/revert-pci-hv-fix-a-timing-issue-which-causes-kdump-to-fail-occasionally.patch queue-5.15/pci-hv-remove-the-useless-hv_pcichild_state-from-struct-hv_pci_dev.patch queue-5.15/drivers-hv-vmbus-call-hv_synic_free-if-hv_synic_alloc-fails.patch queue-5.15/pci-hv-fix-a-race-condition-bug-in-hv_pci_query_relations.patch