On Tue, Apr 19, 2022 at 02:23:25PM +0200, Andrea Parri (Microsoft) wrote: > Dexuan wrote: > > "[...] when we disable AccelNet, the host PCI VSP driver sends a > PCI_EJECT message first, and the channel callback may set > hpdev->state to hv_pcichild_ejecting on a different CPU. This can > cause hv_compose_msi_msg() to exit from the loop and 'return', and > the on-stack variable 'ctxt' is invalid. Now, if the response > message from the host arrives, the channel callback will try to > access the invalid 'ctxt' variable, and this may cause a crash." > > Schematically: > > Hyper-V sends PCI_EJECT msg > hv_pci_onchannelcallback() > state = hv_pcichild_ejecting > hv_compose_msi_msg() > alloc and init comp_pkt > state == hv_pcichild_ejecting > Hyper-V sends VM_PKT_COMP msg > hv_pci_onchannelcallback() > retrieve address of comp_pkt > 'free' comp_pkt and return > comp_pkt->completion_func() > > Dexuan also showed how the crash can be triggered after introducing > suitable delays in the driver code, thus validating the 'assumption' > that the host can still normally respond to the guest's compose_msi > request after the host has started to eject the PCI device. > > Fix the synchronization by leveraging the requestor lock as follows: > > - Before 'return'-ing in hv_compose_msi_msg(), remove the ID (while > holding the requestor lock) associated to the completion packet. > > - Retrieve the address *and call ->completion_func() within a same > (requestor) critical section in hv_pci_onchannelcallback(). > > Reported-by: Wei Hu <weh@xxxxxxxxxxxxx> > Reported-by: Dexuan Cui <decui@xxxxxxxxxxxxx> > Suggested-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> > Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@xxxxxxxxx> > Reviewed-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> > --- > v1 included: > > Fixes: de0aa7b2f97d3 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") > > as a reference, but not a reference for the stable-kernel bots. > The back-port would depend on the entire series which, in turn, > shouldn't be backported to commits preceding bf5fd8cae3c8f. > > drivers/pci/controller/pci-hyperv.c | 33 +++++++++++++++++++++++------ > 1 file changed, 27 insertions(+), 6 deletions(-) Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index 0252b4bbc8d15..59f0197b94c78 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -1695,7 +1695,7 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) > struct pci_create_interrupt3 v3; > } int_pkts; > } __packed ctxt; > - > + u64 trans_id; > u32 size; > int ret; > > @@ -1757,10 +1757,10 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) > goto free_int_desc; > } > > - ret = vmbus_sendpacket(hpdev->hbus->hdev->channel, &ctxt.int_pkts, > - size, (unsigned long)&ctxt.pci_pkt, > - VM_PKT_DATA_INBAND, > - VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); > + ret = vmbus_sendpacket_getid(hpdev->hbus->hdev->channel, &ctxt.int_pkts, > + size, (unsigned long)&ctxt.pci_pkt, > + &trans_id, VM_PKT_DATA_INBAND, > + VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); > if (ret) { > dev_err(&hbus->hdev->device, > "Sending request for interrupt failed: 0x%x", > @@ -1839,6 +1839,15 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) > > enable_tasklet: > tasklet_enable(&channel->callback_event); > + /* > + * The completion packet on the stack becomes invalid after 'return'; > + * remove the ID from the VMbus requestor if the identifier is still > + * mapped to/associated with the packet. (The identifier could have > + * been 're-used', i.e., already removed and (re-)mapped.) > + * > + * Cf. hv_pci_onchannelcallback(). > + */ > + vmbus_request_addr_match(channel, trans_id, (unsigned long)&ctxt.pci_pkt); > free_int_desc: > kfree(int_desc); > drop_reference: > @@ -2717,6 +2726,7 @@ static void hv_pci_onchannelcallback(void *context) > struct pci_dev_inval_block *inval; > struct pci_dev_incoming *dev_message; > struct hv_pci_dev *hpdev; > + unsigned long flags; > > buffer = kmalloc(bufferlen, GFP_ATOMIC); > if (!buffer) > @@ -2751,8 +2761,11 @@ static void hv_pci_onchannelcallback(void *context) > switch (desc->type) { > case VM_PKT_COMP: > > - req_addr = chan->request_addr_callback(chan, req_id); > + lock_requestor(chan, flags); > + req_addr = __vmbus_request_addr_match(chan, req_id, > + VMBUS_RQST_ADDR_ANY); > if (req_addr == VMBUS_RQST_ERROR) { > + unlock_requestor(chan, flags); > dev_err(&hbus->hdev->device, > "Invalid transaction ID %llx\n", > req_id); > @@ -2760,9 +2773,17 @@ static void hv_pci_onchannelcallback(void *context) > } > comp_packet = (struct pci_packet *)req_addr; > response = (struct pci_response *)buffer; > + /* > + * Call ->completion_func() within the critical section to make > + * sure that the packet pointer is still valid during the call: > + * here 'valid' means that there's a task still waiting for the > + * completion, and that the packet data is still on the waiting > + * task's stack. Cf. hv_compose_msi_msg(). > + */ > comp_packet->completion_func(comp_packet->compl_ctxt, > response, > bytes_recvd); > + unlock_requestor(chan, flags); > break; > > case VM_PKT_DATA_INBAND: > -- > 2.25.1 >