> -----Original Message----- > From: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> > Sent: Thursday, August 15, 2019 12:11 PM > To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > Cc: sashal@xxxxxxxxxx; bhelgaas@xxxxxxxxxx; linux- > hyperv@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; KY Srinivasan > <kys@xxxxxxxxxxxxx>; Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>; > olaf@xxxxxxxxx; vkuznets <vkuznets@xxxxxxxxxx>; linux- > kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH v5,1/2] PCI: hv: Detect and fix Hyper-V PCI domain > number collision > > On Wed, Aug 14, 2019 at 03:52:15PM +0000, Haiyang Zhang wrote: > > Currently in Azure cloud, for passthrough devices, the host sets the device > > instance ID's bytes 8 - 15 to a value derived from the host HWID, which is > > the same on all devices in a VM. So, the device instance ID's bytes 8 and 9 > > provided by the host are no longer unique. This affects all Azure hosts > > since last year, and can cause device passthrough to VMs to fail because > > Bjorn already asked, can you be a bit more specific than "since last > year" here please ? > > It would be useful to understand when/how this became an issue. The host change happens around July 2018. The Azure roll out takes multi weeks, so there is no specific date. I will include the Month Year in the log. > > > the bytes 8 and 9 are used as PCI domain number. Collision of domain > > numbers will cause the second device with the same domain number fail to > > load. > > > > In the cases of collision, we will detect and find another number that is > > not in use. > > > > Suggested-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> > > Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > > Acked-by: Sasha Levin <sashal@xxxxxxxxxx> > > --- > > drivers/pci/controller/pci-hyperv.c | 92 > +++++++++++++++++++++++++++++++------ > > 1 file changed, 79 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci- > hyperv.c > > index 40b6254..31b8fd5 100644 > > --- a/drivers/pci/controller/pci-hyperv.c > > +++ b/drivers/pci/controller/pci-hyperv.c > > @@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct > hv_pcibus_device *hbus) > > complete(&hbus->remove_event); > > } > > > > +#define HVPCI_DOM_MAP_SIZE (64 * 1024) > > +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE); > > + > > +/* > > + * PCI domain number 0 is used by emulated devices on Gen1 VMs, so > define 0 > > + * as invalid for passthrough PCI devices of this driver. > > + */ > > +#define HVPCI_DOM_INVALID 0 > > + > > +/** > > + * hv_get_dom_num() - Get a valid PCI domain number > > + * Check if the PCI domain number is in use, and return another number if > > + * it is in use. > > + * > > + * @dom: Requested domain number > > + * > > + * return: domain number on success, HVPCI_DOM_INVALID on failure > > + */ > > +static u16 hv_get_dom_num(u16 dom) > > +{ > > + unsigned int i; > > > + > > + if (test_and_set_bit(dom, hvpci_dom_map) == 0) > > + return dom; > > + > > + for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) { > > + if (test_and_set_bit(i, hvpci_dom_map) == 0) > > + return i; > > + } > > Don't you need locking around code reading/updating hvpci_dom_map ? If the bit changes after for_each_clear_bit() considers it as a "clear bit" - the test_and_set_bit() does test&set in an atomic operation - the return value will be 1 instead of 0. Then the loop will continue to the next clear bit, until the test_and_set_bit() is successful. So no locking is necessary here. Thanks, - Haiyang