> From: Lorenzo Pieralisi <lpieralisi@xxxxxxxxxx> > Sent: Thursday, May 25, 2023 3:16 AM > > --- a/drivers/pci/controller/pci-hyperv.c > > +++ b/drivers/pci/controller/pci-hyperv.c > > @@ -643,6 +643,11 @@ static void hv_arch_irq_unmask(struct irq_data > > *data) > > pbus = pdev->bus; > > hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); > > int_desc = data->chip_data; > > + if (!int_desc) { > > + dev_warn(&hbus->hdev->device, "%s() can not unmask irq %u\n", > > + __func__, data->irq); > > + return; > > + } > > That's a check that should be there regardless ? Yes. Normally data->chip_data is set at the end of hv_compose_msi_msg(), and it should not be NULL. However, in rare circumstances, we might see a failure in hv_compose_msi_msg(), e.g. the hypervisor/host might return an error in comp.comp_pkt.completion_status (at least this is possible in theory). In case we see a failure in hv_compose_msi_msg(), data->chip_data stays with its default value of NULL; because the return type of hv_compose_msi_msg() is "void", we can not return an error to the upper layer; later, when the upper layer calls request_irq() -> ... -> hv_irq_unmask(), hv_arch_irq_unmask() crashes because data->chip_data is NULL -- with this check, we're able to error out gracefully, and the user can better understand what goes wrong. > > spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags); > > > > @@ -1911,12 +1916,6 @@ static void hv_compose_msi_msg(struct > irq_data *data, struct msi_msg *msg) > > hv_pci_onchannelcallback(hbus); > > spin_unlock_irqrestore(&channel->sched_lock, flags); > > > > - if (hpdev->state == hv_pcichild_ejecting) { > > - dev_err_once(&hbus->hdev->device, > > - "the device is being ejected\n"); > > - goto enable_tasklet; > > - } > > - > > udelay(100); > > } > > I don't understand why this code is in hv_compose_msi_msg() in the first > place (and why only in that function ?) to me this looks like you are > adding plasters in the code that can turn out to be problematic while > ejecting a device, this does not seem robust at all - that's my opinion. The code was incorrectly added by de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") de0aa7b2f97d says " 2. If the host is ejecting the VF device before we reach hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg() forever, because at this time the host doesn't respond to the CREATE_INTERRUPT request. " de0aa7b2f97d implies that the host doesn't respond to the guest's CREATE_INTERRUPT request once the guest receives the PCI_EJECT message -- this is incorrect: after the guest receives the PCI_EJECT message, actually the host still responds to the guest's request, as long as the guest sends the request within 1 minute AND the guest doesn't send a PCI_EJECTION_COMPLETE message to the host in hv_eject_device_work(). The real issue is that currently the guest can send PCI_EJECTION_COMPLETE to the host before the guest finishes the device-probing/removing handling -- once the guest sends PCI_EJECTION_COMPLETE, the host unassigns the PCI device from the guest and ignores any request from the guest. So here the check "hpdev->state == hv_pcichild_ejecting" is incorrect. We should remove the check since it can cause a panic (see the commit messsage for the detailed explanation) The "premature PCI_EJECTION_COMPLETE" issue is resolved by: [PATCH v3 5/6] PCI: hv: Add a per-bus mutex state_lock > Feel free to merge this code, I can't ACK it, sorry. > > Lorenzo Thanks for sharing the thougths!