On Fri, 2021-03-05 at 12:13 -0500, Andrey Grodzovsky wrote: > > On 2021-03-05 11:08 a.m., Sergei Miroshnichenko wrote: > > On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote: > > > + linux-pci > > > > > > On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote: > > > > On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote: > > > > > On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote: > > > > > > On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote: > > > > > > > On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote: > > > > > > > > ... > > > > > > > Are you saying that even without hot-plugging, while both > > > > > > > nvme > > > > > > > and > > > > > > > AMD > > > > > > > card are present > > > > > > > right from boot, you still get BARs moving and MMIO > > > > > > > ranges > > > > > > > reassigned > > > > > > > for NVME BARs > > > > > > > just because amdgpu driver will start resize of AMD card > > > > > > > BARs > > > > > > > and > > > > > > > this > > > > > > > will trigger NVMEs BARs move to > > > > > > > allow AMD card BARs to cover full range of VIDEO RAM ? > > > > > > Yes. Unconditionally, because it is unknown beforehand if > > > > > > NVMe's > > > > > > BAR > > > > > > movement will help. In this particular case BAR movement is > > > > > > not > > > > > > needed, > > > > > > but is done anyway. > > > > > > > > > > > > BARs are not moved one by one, but the kernel releases all > > > > > > the > > > > > > releasable ones, and then recalculates a new BAR layout to > > > > > > fit > > > > > > them > > > > > > all. Kernel's algorithm is different from BIOS's, so NVME > > > > > > has > > > > > > appeared > > > > > > at a new place. > > > > > > > > > > > > This is triggered by following: > > > > > > - at boot, if BIOS had assigned not every BAR; > > > > > > - during pci_resize_resource(); > > > > > > - during pci_rescan_bus() -- after a pciehp event or a > > > > > > manual > > > > > > via > > > > > > sysfs. > > > > > > > > > > By manual via sysfs you mean something like this - 'echo 1 > > > > > > /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 > > > > > > /sys/bus/pci/rescan ' ? I am looking into how most reliably > > > > > trigger > > > > > PCI > > > > > code to call my callbacks even without having external PCI > > > > > cage > > > > > for > > > > > GPU > > > > > (will take me some time to get it). > > > > > > > > Yeah, this is our way to go when a device can't be physically > > > > removed > > > > or unpowered remotely. With just a bit shorter path: > > > > > > > > sudo sh -c 'echo 1 > > > > > /sys/bus/pci/devices/0000\:0c\:00.0/remove' > > > > sudo sh -c 'echo 1 > /sys/bus/pci/rescan' > > > > > > > > Or, just a second command (rescan) is enough: a BAR movement > > > > attempt > > > > will be triggered even if there were no changes in PCI > > > > topology. > > > > > > > > Serge > > > > > > > > > > Hi Segrei > > > > > > Here is a link to initial implementation on top of your tree > > > (movable_bars_v9.1) - > > > https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3D05d6abceed650181bb7fe0a49884a26e378b908e&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C637bf3aab3634ab69dc608d8dff0e0b1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505572958565332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F2urbXqCLMzZBKe0lTyAfhgqDqDxiOZN04R9LTVxvI0%3D&reserved=0 > > > I am able to pass one re-scan cycle and can use the card > > > afterwards > > > (see > > > log1.log). > > > But, according to your prints only BAR5 which is registers BAR > > > was > > > updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> > > > 0xfc100000) > > > while I am interested to test BAR0 (Graphic RAM) move since this > > > is > > > where most of the complexity is. Is there a way to hack your code > > > to > > > force this ? > > > > Hi Andrey, > > > > Regarding the amdgpu's BAR0 remaining on its place: it seems this > > is > > because of fixed BARs starting from fc600000. The kernel tends to > > group > > the BARs close to each other, making a bridge window as compact as > > possible. So the BAR0 had occupied the closest "comfortable" slots > > 0xe0000000-0xefffffff, with the resulting bridge window of bus 00 > > covering all the BARs: > > > > pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff > > window] > > > > I'll let you know if I get an idea how to rearrange that manually. > > > > Two GPUs can actually swap their places. > > What do you mean ? I was thinking: when the scenario of a PCI rescan with two GPUs (as was described below) will start working, BAR0 of GPU0 can take place of BAR0 of GPU1 after the first rescan. > > What also can make a BAR movable -- is rmmod'ing its driver. It > > could > > be some hack from within a tmux, like: > > > > rmmod igb; \ > > rmmod xhci_hcd; \ > > rmmod ahci; \ > > echo 1 > /sys/bus/pci/rescan; \ > > modprobe igb; \ > > modprobe xhci_hcd; \ > > modprobe ahci > > But should I also rmmod amdgpu ? Or modprobing back the other > drivers > should cause (hopefully) BAR0 move in AMD graphic card ? You have already made the amdgpu movable, so no need to rmmod it -- just those with fixed BARs: xhci_hcd 0000:0c:00.3: BAR 0: assigned fixed [mem 0xfc600000- 0xfc6fffff 64bit] igb 0000:07:00.0: BAR 0: assigned fixed [mem 0xfc900000-0xfc91ffff] igb 0000:07:00.0: BAR 3: assigned fixed [mem 0xfc920000-0xfc923fff] ahci 0000:02:00.1: BAR 6: assigned fixed [mem 0xfcb00000-0xfcb7ffff pref] ahci 0000:02:00.1: BAR 5: assigned fixed [mem 0xfcb80000- 0xfcb9ffff] xhci_hcd 0000:02:00.0: BAR 0: assigned fixed [mem 0xfcba0000- 0xfcba7fff 64bit] xhci_hcd 0000:05:00.0: BAR 0: assigned fixed [mem 0xfca00000- 0xfca07fff 64bit] ahci 0000:0d:00.2: BAR 5: assigned fixed [mem 0xfce08000- 0xfce08fff] The expected result is they all move closer to the start of PCI address space. > > I think pci_release_resource() should not be in > > amdgpu_device_unmap_mmio() -- the patched kernel will do that > > itself > > for BARs the amdgpu_device_bar_fixed() returns false. Even more -- > > the > > kernel will ensure that all BARs which were working before, are > > reassigned properly, so it needs them to be assigned before the > > procedure. > > The same for pci_assign_unassigned_bus_resources() in > > amdgpu_device_remap_mmio(): this callback is invoked from > > pci_rescan_bus() after pci_assign_unassigned_root_bus_resources(). > > This seems to me in contrast to your documentation (see > https://github.com/YADRO-KNS/linux/commit/5bc12ba7c74f1c19c11db29b4807bd32acfac2c2 > step 1) although step 2 seems also to contradict step 1 with regard > to > BARs release - so now I am a bit confused. Also looking at > nvme_dev_unmap - it calls pci_release_mem_regions. Symmetrical > acquisition happens in nvme_dev_unmap. Ah, there is a difference between pci_release_region() and pci_release_resource(), so subtle that I had to refresh my memory. You are right, this has to be explained in the documentation! $ sudo cat /proc/iomem ... f0000000-fcffffff : PCI Bus 0000:00 -- root bus resource ... fcf00000-fcffffff : PCI Bus 0000:01 -- bridge window fcf00000-fcf03fff : 0000:01:00.0 -- pci resource (BAR) fcf00000-fcf03fff : nvme -- pci region (reserved by a driver, has its name). So the nvme_dev_unmap() reflects with pci_release_region() that the BAR is not used by the driver anymore -- this actually should be called in every rescan_prepare(). But the pci_release_resource() tells to the PCI subsystem that the BAR is "released" from the device and has to be assigned to some address before using again, and makes the pci_resource_start(pdev, relased_barno) invalid. Why the quotes: pci_release_resource() doesn't turn off the BAR, doesn't write the registers -- this happens later. I thouht at first that pci_release_resource() is not safe in a rescan_prepare(), but then double-checked, and found it's fine, just not needed, as the kernel will do it anyway. And the pci_bus_check_bars_assigned() to compare the bitmasks of successfully assigned BARs is called *before* the hook. > > > When testing with 2 graphic cards and triggering rescan, hard > > > hang of > > > the system happens during rescan_prepare of the second card when > > > stopping the HW (see log2.log) - I don't understand why this > > > would > > > happen as each of them passes fine when they are standalone > > > tested > > > and > > > there should be no interdependence between them as far as i know. > > > Do you have any idea ? > > > > What happens with two GPUs is unclear for me as well, nothing looks > > suspicious. > > > > Serge > >