> -----Original Message----- > From: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Sent: Wednesday, August 14, 2019 12:34 AM > To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > Cc: sashal@xxxxxxxxxx; lorenzo.pieralisi@xxxxxxx; linux- > hyperv@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; KY Srinivasan > <kys@xxxxxxxxxxxxx>; Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>; > olaf@xxxxxxxxx; vkuznets <vkuznets@xxxxxxxxxx>; linux- > kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH v4,1/2] PCI: hv: Detect and fix Hyper-V PCI domain > number collision > > Thanks for splitting these; I think that makes more sense. > > On Wed, Aug 14, 2019 at 12:38:54AM +0000, Haiyang Zhang wrote: > > Currently in Azure cloud, for passthrough devices including GPU, the host > > sets the device instance ID's bytes 8 - 15 to a value derived from the host > > HWID, which is the same on all devices in a VM. So, the device instance > > ID's bytes 8 and 9 provided by the host are no longer unique. This can > > cause device passthrough to VMs to fail because the bytes 8 and 9 are used > > as PCI domain number. Collision of domain numbers will cause the second > > device with the same domain number fail to load. > > I think this patch is fine. I could be misunderstanding the commit > log, but when you say "the ID bytes 8 and 9 are *no longer* unique", > that suggests that they *used* to be unique but stopped being unique > at some point, which of course raises the question of *when* they > became non-unique. > > The specific information about that point would be useful to have in > the commit log, e.g., is this related to a specific version of Azure, > a configuration change, etc? The host side change happened last year, rolled out to all azure hosts. I will put "all current azure hosts" in the commit log. > Does this problem affect GPUs more than other passthrough devices? If > all passthrough devices are affected, why mention GPUs in particular? > I can't tell whether that information is relevant or superfluous. We found this issue initially on multiple passthrough GPUs, I mentioned this just as an example. I will remove this word, because any PCI devices may be affected. Thanks, - Haiyang