> -----Original Message----- > From: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Sent: Tuesday, August 6, 2019 2:55 PM > To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > Cc: sashal@xxxxxxxxxx; lorenzo.pieralisi@xxxxxxx; linux- > hyperv@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; KY Srinivasan > <kys@xxxxxxxxxxxxx>; Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>; > olaf@xxxxxxxxx; vkuznets <vkuznets@xxxxxxxxxx>; linux- > kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number > collision > > On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote: > > Due to Azure host agent settings, the device instance ID's bytes 8 and > > 9 are no longer unique. This causes some of the PCI devices not > > showing up in VMs with multiple passthrough devices, such as GPUs. So, > > as recommended by Azure host team, we now use the bytes 4 and 5 which > > usually provide unique numbers. > > What does "Azure host agent settings" mean? Would it be useful to say > something more specific, so users could ready this and say "oh, I'm using the > Azure host agent settings mentioned here, so I need this patch"? Is this > related to a specific Azure host agent commit or release? > > "This causes some of the PCI devices ..." is not a sentence. I think I > understand what you're saying -- "This sometimes causes device passthrough > to VMs to fail." Is there something about GPUs that makes them more > susceptible to this problem? > > I think there are really two changes in this patch: > > 1) Start with a domain number from bytes 4-5 instead of bytes 8-9. > > 2) If the domain number is not unique, allocate another one using > the bitmap. > > It sounds like part 2) by itself would be enough to solve the problem, and > including part 1) just reduces the likelihood of having to allocate another > domain number. > > > In the rare cases of collision, we will detect and find another number > > that is not in use. > > Thanks to Michael Kelley <mikelley@xxxxxxxxxxxxx> for proposing this > idea. > > This looks like two paragraphs and should have a blank line between them. > > > Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > > --- > > drivers/pci/controller/pci-hyperv.c | 91 > > +++++++++++++++++++++++++++++++------ > > 1 file changed, 78 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/pci/controller/pci-hyperv.c > > b/drivers/pci/controller/pci-hyperv.c > > index 82acd61..6b9cc6e60a 100644 > > --- a/drivers/pci/controller/pci-hyperv.c > > +++ b/drivers/pci/controller/pci-hyperv.c > > @@ -37,6 +37,8 @@ > > * the PCI back-end driver in Hyper-V. > > */ > > > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > + > > #include <linux/kernel.h> > > #include <linux/module.h> > > #include <linux/pci.h> > > @@ -2507,6 +2509,47 @@ static void put_hvpcibus(struct > hv_pcibus_device *hbus) > > complete(&hbus->remove_event); > > } > > > > +#define HVPCI_DOM_MAP_SIZE (64 * 1024) static > > +DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE); > > + > > +/* PCI domain number 0 is used by emulated devices on Gen1 VMs, so > > +define 0 > > + * as invalid for passthrough PCI devices of this driver. > > + */ > > Please use the usual multi-line comment style: > > /* > * PCI domain number ... > */ > > > +#define HVPCI_DOM_INVALID 0 > > + > > +/** > > + * hv_get_dom_num() - Get a valid PCI domain number > > + * Check if the PCI domain number is in use, and return another > > +number if > > + * it is in use. > > + * > > + * @dom: Requested domain number > > + * > > + * return: domain number on success, HVPCI_DOM_INVALID on failure */ > > +static u16 hv_get_dom_num(u16 dom) { > > + unsigned int i; > > + > > + if (test_and_set_bit(dom, hvpci_dom_map) == 0) > > + return dom; > > + > > + for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) { > > + if (test_and_set_bit(i, hvpci_dom_map) == 0) > > + return i; > > + } > > + > > + return HVPCI_DOM_INVALID; > > +} > > + > > +/** > > + * hv_put_dom_num() - Mark the PCI domain number as free > > + * @dom: Domain number to be freed > > + */ > > +static void hv_put_dom_num(u16 dom) > > +{ > > + clear_bit(dom, hvpci_dom_map); > > +} > > + > > /** > > * hv_pci_probe() - New VMBus channel probe, for a root PCI bus > > * @hdev: VMBus's tracking struct for this root PCI bus > > @@ -2518,6 +2561,7 @@ static int hv_pci_probe(struct hv_device *hdev, > > const struct hv_vmbus_device_id *dev_id) { > > struct hv_pcibus_device *hbus; > > + u16 dom_req, dom; > > int ret; > > > > /* > > @@ -2532,19 +2576,32 @@ static int hv_pci_probe(struct hv_device > *hdev, > > hbus->state = hv_pcibus_init; > > > > /* > > - * The PCI bus "domain" is what is called "segment" in ACPI and > > - * other specs. Pull it from the instance ID, to get something > > - * unique. Bytes 8 and 9 are what is used in Windows guests, so > > - * do the same thing for consistency. Note that, since this code > > - * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee > > - * that (1) the only domain in use for something that looks like > > - * a physical PCI bus (which is actually emulated by the > > - * hypervisor) is domain 0 and (2) there will be no overlap > > - * between domains derived from these instance IDs in the same > > - * VM. > > + * The PCI bus "domain" is what is called "segment" in ACPI and other > > + * specs. Pull it from the instance ID, to get something usually > > + * unique. In rare cases of collision, we will find out another number > > + * not in use. > > + * Note that, since this code only runs in a Hyper-V VM, Hyper-V > > + * together with this guest driver can guarantee that (1) The only > > + * domain used by Gen1 VMs for something that looks like a physical > > + * PCI bus (which is actually emulated by the hypervisor) is domain 0. > > + * (2) There will be no overlap between domains (after fixing possible > > + * collisions) in the same VM. > > Please use blank lines between paragraphs. > > > */ > > - hbus->sysdata.domain = hdev->dev_instance.b[9] | > > - hdev->dev_instance.b[8] << 8; > > + dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4]; > > + dom = hv_get_dom_num(dom_req); > > + > > + if (dom == HVPCI_DOM_INVALID) { > > + pr_err("Unable to use dom# 0x%hx or other numbers", > > + dom_req); > > + ret = -EINVAL; > > + goto free_bus; > > + } > > + > > + if (dom != dom_req) > > + pr_info("PCI dom# 0x%hx has collision, using 0x%hx", > > + dom_req, dom); > > Can these use "dev_err/info(&hdev->device, ...)" like the other message in > this function? It's always nicer to have a specific device reference when one > is available. Probably don't need the new pr_fmt() definition if you do this. > > > + > > + hbus->sysdata.domain = dom; > > > > hbus->hdev = hdev; > > refcount_set(&hbus->remove_lock, 1); @@ -2559,7 +2616,7 @@ > static > > int hv_pci_probe(struct hv_device *hdev, > > hbus->sysdata.domain); > > if (!hbus->wq) { > > ret = -ENOMEM; > > - goto free_bus; > > + goto free_dom; > > } > > > > ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, > > 0, @@ -2636,6 +2693,8 @@ static int hv_pci_probe(struct hv_device > *hdev, > > vmbus_close(hdev->channel); > > destroy_wq: > > destroy_workqueue(hbus->wq); > > +free_dom: > > + hv_put_dom_num(hbus->sysdata.domain); > > free_bus: > > free_page((unsigned long)hbus); > > return ret; > > @@ -2717,6 +2776,9 @@ static int hv_pci_remove(struct hv_device *hdev) > > put_hvpcibus(hbus); > > wait_for_completion(&hbus->remove_event); > > destroy_workqueue(hbus->wq); > > + > > + hv_put_dom_num(hbus->sysdata.domain); > > + > > free_page((unsigned long)hbus); > > return 0; > > } > > @@ -2744,6 +2806,9 @@ static void __exit exit_hv_pci_drv(void) > > > > static int __init init_hv_pci_drv(void) { > > + /* Set the invalid domain number's bit, so it will not be used */ > > + set_bit(HVPCI_DOM_INVALID, hvpci_dom_map); > > + > > return vmbus_driver_register(&hv_pci_drv); > > } > > > > -- > > 1.8.3.1 Thanks for the comments. I will make the recommended changes. - Haiyang