> -----Original Message----- > From: Dexuan Cui <decui@xxxxxxxxxxxxx> > Sent: Tuesday, March 28, 2023 2:33 PM > To: bhelgaas@xxxxxxxxxx; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; > Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; Jake Oshins > <jakeo@xxxxxxxxxxxxx>; kuba@xxxxxxxxxx; kw@xxxxxxxxx; KY Srinivasan > <kys@xxxxxxxxxxxxx>; leon@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; > lpieralisi@xxxxxxxxxx; Michael Kelley (LINUX) <mikelley@xxxxxxxxxxxxx>; > pabeni@xxxxxxxxxx; robh@xxxxxxxxxx; saeedm@xxxxxxxxxx; > wei.liu@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; boqun.feng@xxxxxxxxx; > Wei Hu <weh@xxxxxxxxxxxxx> > Cc: linux-hyperv@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux- > rdma@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx > Subject: RE: [PATCH 4/6] Revert "PCI: hv: Fix a timing issue which causes > kdump to fail occasionally" > > > From: Dexuan Cui <decui@xxxxxxxxxxxxx> > > Sent: Monday, March 27, 2023 9:51 PM > > To: bhelgaas@xxxxxxxxxx; davem@xxxxxxxxxxxxx; Dexuan Cui > > <decui@xxxxxxxxxxxxx>; edumazet@xxxxxxxxxx; Haiyang Zhang > > <haiyangz@xxxxxxxxxxxxx>; Jake Oshins <jakeo@xxxxxxxxxxxxx>; > > kuba@xxxxxxxxxx; kw@xxxxxxxxx; KY Srinivasan <kys@xxxxxxxxxxxxx>; > > leon@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; lpieralisi@xxxxxxxxxx; > > Michael Kelley (LINUX) <mikelley@xxxxxxxxxxxxx>; pabeni@xxxxxxxxxx; > > robh@xxxxxxxxxx; saeedm@xxxxxxxxxx; wei.liu@xxxxxxxxxx; Long Li > > <longli@xxxxxxxxxxxxx>; boqun.feng@xxxxxxxxx > > Cc: linux-hyperv@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > > linux-rdma@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx > > Subject: [PATCH 4/6] Revert "PCI: hv: Fix a timing issue which causes > > kdump to fail occasionally" > > > > This reverts commit d6af2ed29c7c1c311b96dac989dcb991e90ee195. > > > > The statement "the hv_pci_bus_exit() call releases structures of all > > its child devices" in commit d6af2ed29c7c is not true: in the path > > hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): > > the parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* > > release the child "struct hv_pci_dev *hpdev" that is created earlier > > in > > pci_devices_present_work() -> new_pcichild_device(). > > > > The commit d6af2ed29c7c was originally made in July 2020 for RHEL 7.7, > > where the old version of hv_pci_bus_exit() was used; when the commit > > was rebased and merged into the upstream, people didn't notice that > > it's not really necessary. The commit itself doesn't cause any issue, > > but it makes hv_pci_probe() more complicated. Revert it to facilitate > > some upcoming changes to hv_pci_probe(). > > > > Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> > > --- > > drivers/pci/controller/pci-hyperv.c | 71 > > ++++++++++++++--------------- > > 1 file changed, 34 insertions(+), 37 deletions(-) > > > > diff --git a/drivers/pci/controller/pci-hyperv.c > > b/drivers/pci/controller/pci-hyperv.c > > index 46df6d093d68..48feab095a14 100644 > > --- a/drivers/pci/controller/pci-hyperv.c > > +++ b/drivers/pci/controller/pci-hyperv.c > > @@ -3225,8 +3225,10 @@ static int hv_pci_enter_d0(struct hv_device > *hdev) > > struct pci_bus_d0_entry *d0_entry; > > struct hv_pci_compl comp_pkt; > > struct pci_packet *pkt; > > + bool retry = true; > > int ret; > > > > +enter_d0_retry: > > /* > > * Tell the host that the bus is ready to use, and moved into the > > * powered-on state. This includes telling the host which region @@ > > -3253,6 +3255,38 @@ static int hv_pci_enter_d0(struct hv_device *hdev) > > if (ret) > > goto exit; > > > > + /* > > + * In certain case (Kdump) the pci device of interest was > > + * not cleanly shut down and resource is still held on host > > + * side, the host could return invalid device status. > > + * We need to explicitly request host to release the resource > > + * and try to enter D0 again. > > + */ > > + if (comp_pkt.completion_status < 0 && retry) { > > + retry = false; > > + > > + dev_err(&hdev->device, "Retrying D0 Entry\n"); > > + > > + /* > > + * Hv_pci_bus_exit() calls hv_send_resource_released() > > + * to free up resources of its child devices. > > + * In the kdump kernel we need to set the > > + * wslot_res_allocated to 255 so it scans all child > > + * devices to release resources allocated in the > > + * normal kernel before panic happened. > > + */ > > + hbus->wslot_res_allocated = 255; > > + > > + ret = hv_pci_bus_exit(hdev, true); > > + > > + if (ret == 0) { > > + kfree(pkt); > > + goto enter_d0_retry; > > + } > > + dev_err(&hdev->device, > > + "Retrying D0 failed with ret %d\n", ret); > > + } > > + > > if (comp_pkt.completion_status < 0) { > > dev_err(&hdev->device, > > "PCI Pass-through VSP failed D0 Entry with > status %x\n", @@ > > -3493,7 +3527,6 @@ static int hv_pci_probe(struct hv_device *hdev, > > struct hv_pcibus_device *hbus; > > u16 dom_req, dom; > > char *name; > > - bool enter_d0_retry = true; > > int ret; > > > > /* > > @@ -3633,47 +3666,11 @@ static int hv_pci_probe(struct hv_device *hdev, > > if (ret) > > goto free_fwnode; > > > > -retry: > > ret = hv_pci_query_relations(hdev); > > if (ret) > > goto free_irq_domain; > > > > ret = hv_pci_enter_d0(hdev); > > - /* > > - * In certain case (Kdump) the pci device of interest was > > - * not cleanly shut down and resource is still held on host > > - * side, the host could return invalid device status. > > - * We need to explicitly request host to release the resource > > - * and try to enter D0 again. > > - * Since the hv_pci_bus_exit() call releases structures > > - * of all its child devices, we need to start the retry from > > - * hv_pci_query_relations() call, requesting host to send > > - * the synchronous child device relations message before this > > - * information is needed in hv_send_resources_allocated() > > - * call later. > > - */ > > - if (ret == -EPROTO && enter_d0_retry) { > > - enter_d0_retry = false; > > - > > - dev_err(&hdev->device, "Retrying D0 Entry\n"); > > - > > - /* > > - * Hv_pci_bus_exit() calls hv_send_resources_released() > > - * to free up resources of its child devices. > > - * In the kdump kernel we need to set the > > - * wslot_res_allocated to 255 so it scans all child > > - * devices to release resources allocated in the > > - * normal kernel before panic happened. > > - */ > > - hbus->wslot_res_allocated = 255; > > - ret = hv_pci_bus_exit(hdev, true); > > - > > - if (ret == 0) > > - goto retry; > > - > > - dev_err(&hdev->device, > > - "Retrying D0 failed with ret %d\n", ret); > > - } > > if (ret) > > goto free_irq_domain; > > > > -- > > 2.25.1 Looks good to me. Thanks for fixing this. Wei