Re: Question about supporting AMD eGPU hot plug case

Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> · Fri, 5 Mar 2021 12:13:43 -0500

On 2021-03-05 11:08 a.m., Sergei Miroshnichenko wrote:
On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote:
+ linux-pci

On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote:
On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote:
On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote:
On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote:
On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote:
...
Are you saying that even without hot-plugging, while both
nvme
and
AMD
card are present
right from boot, you still get BARs moving and MMIO ranges
reassigned
for NVME BARs
just because amdgpu driver will start resize of AMD card BARs
and
this
will trigger NVMEs BARs move to
allow AMD card BARs to cover full range of VIDEO RAM ?
Yes. Unconditionally, because it is unknown beforehand if
NVMe's
BAR
movement will help. In this particular case BAR movement is not
needed,
but is done anyway.

BARs are not moved one by one, but the kernel releases all the
releasable ones, and then recalculates a new BAR layout to fit
them
all. Kernel's algorithm is different from BIOS's, so NVME has
appeared
at a new place.

This is triggered by following:
- at boot, if BIOS had assigned not every BAR;
- during pci_resize_resource();
- during pci_rescan_bus() -- after a pciehp event or a manual
via
sysfs.

By manual via sysfs you mean something like this - 'echo 1 >
/sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 >
/sys/bus/pci/rescan ' ? I am looking into how most reliably
trigger
PCI
code to call my callbacks even without having external PCI cage
for
GPU
(will take me some time to get it).

Yeah, this is our way to go when a device can't be physically
removed
or unpowered remotely. With just a bit shorter path:

    sudo sh -c 'echo 1 > /sys/bus/pci/devices/0000\:0c\:00.0/remove'
    sudo sh -c 'echo 1 > /sys/bus/pci/rescan'

Or, just a second command (rescan) is enough: a BAR movement
attempt
will be triggered even if there were no changes in PCI topology.

Serge

Hi Segrei

Here is a link to initial implementation on top of your tree
(movable_bars_v9.1) -
https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3D05d6abceed650181bb7fe0a49884a26e378b908e&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C637bf3aab3634ab69dc608d8dff0e0b1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505572958565332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F2urbXqCLMzZBKe0lTyAfhgqDqDxiOZN04R9LTVxvI0%3D&amp;reserved=0
I am able to pass one re-scan cycle and can use the card afterwards
(see
log1.log).
But, according to your prints only BAR5 which is registers BAR was
updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 ->
0xfc100000)
while I am interested to test BAR0 (Graphic RAM) move since this is
where most of the complexity is. Is there a way to hack your code to
force this ?

Hi Andrey,

Regarding the amdgpu's BAR0 remaining on its place: it seems this is
because of fixed BARs starting from fc600000. The kernel tends to group
the BARs close to each other, making a bridge window as compact as
possible. So the BAR0 had occupied the closest "comfortable" slots
0xe0000000-0xefffffff, with the resulting bridge window of bus 00
covering all the BARs:

     pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff window]

I'll let you know if I get an idea how to rearrange that manually.

Two GPUs can actually swap their places.

What do you mean ?

What also can make a BAR movable -- is rmmod'ing its driver. It could
be some hack from within a tmux, like:

   rmmod igb; \
   rmmod xhci_hcd; \
   rmmod ahci; \
   echo 1 > /sys/bus/pci/rescan; \
   modprobe igb; \
   modprobe xhci_hcd; \
   modprobe ahci

But should I also rmmod amdgpu ? Or modprobing back the other drivers 
should cause (hopefully) BAR0 move in AMD graphic card ?

I think pci_release_resource() should not be in
amdgpu_device_unmap_mmio() -- the patched kernel will do that itself
for BARs the amdgpu_device_bar_fixed() returns false. Even more -- the
kernel will ensure that all BARs which were working before, are
reassigned properly, so it needs them to be assigned before the
procedure.
The same for pci_assign_unassigned_bus_resources() in
amdgpu_device_remap_mmio(): this callback is invoked from
pci_rescan_bus() after pci_assign_unassigned_root_bus_resources().

This seems to me in contrast to your documentation (see 
https://github.com/YADRO-KNS/linux/commit/5bc12ba7c74f1c19c11db29b4807bd32acfac2c2 
step 1) although step 2 seems also to contradict step 1 with regard to 
BARs release - so now I am a bit confused. Also looking at 
nvme_dev_unmap - it calls pci_release_mem_regions. Symmetrical 
acquisition happens in nvme_dev_unmap.

Andrey

When testing with 2 graphic cards and triggering rescan, hard hang of
the system happens during rescan_prepare of the second card  when
stopping the HW (see log2.log) - I don't understand why this would
happen as each of them passes fine when they are standalone tested
and
there should be no interdependence between them as far as i know.
Do you have any idea ?

What happens with two GPUs is unclear for me as well, nothing looks
suspicious.

Serge