Re: Question about supporting AMD eGPU hot plug case

Sergei Miroshnichenko <s.miroshnichenko@xxxxxxxxx> · Fri, 5 Mar 2021 16:08:08 +0000

On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote:
> + linux-pci
> 
> On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote:
> > On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote:
> > > On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote:
> > > > On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote:
> > > > > On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote:
> > > > > > ...
> > > > > Are you saying that even without hot-plugging, while both
> > > > > nvme
> > > > > and
> > > > > AMD
> > > > > card are present
> > > > > right from boot, you still get BARs moving and MMIO ranges
> > > > > reassigned
> > > > > for NVME BARs
> > > > > just because amdgpu driver will start resize of AMD card BARs
> > > > > and
> > > > > this
> > > > > will trigger NVMEs BARs move to
> > > > > allow AMD card BARs to cover full range of VIDEO RAM ?
> > > > Yes. Unconditionally, because it is unknown beforehand if
> > > > NVMe's
> > > > BAR
> > > > movement will help. In this particular case BAR movement is not
> > > > needed,
> > > > but is done anyway.
> > > > 
> > > > BARs are not moved one by one, but the kernel releases all the
> > > > releasable ones, and then recalculates a new BAR layout to fit
> > > > them
> > > > all. Kernel's algorithm is different from BIOS's, so NVME has
> > > > appeared
> > > > at a new place.
> > > > 
> > > > This is triggered by following:
> > > > - at boot, if BIOS had assigned not every BAR;
> > > > - during pci_resize_resource();
> > > > - during pci_rescan_bus() -- after a pciehp event or a manual
> > > > via
> > > > sysfs.
> > > 
> > > By manual via sysfs you mean something like this - 'echo 1 >
> > > /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 >
> > > /sys/bus/pci/rescan ' ? I am looking into how most reliably
> > > trigger
> > > PCI
> > > code to call my callbacks even without having external PCI cage
> > > for
> > > GPU
> > > (will take me some time to get it).
> > 
> > Yeah, this is our way to go when a device can't be physically
> > removed
> > or unpowered remotely. With just a bit shorter path:
> > 
> >    sudo sh -c 'echo 1 > /sys/bus/pci/devices/0000\:0c\:00.0/remove'
> >    sudo sh -c 'echo 1 > /sys/bus/pci/rescan'
> > 
> > Or, just a second command (rescan) is enough: a BAR movement
> > attempt
> > will be triggered even if there were no changes in PCI topology.
> > 
> > Serge
> > 
> 
> Hi Segrei
> 
> Here is a link to initial implementation on top of your tree 
> (movable_bars_v9.1) - 
> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=05d6abceed650181bb7fe0a49884a26e378b908e
> I am able to pass one re-scan cycle and can use the card afterwards
> (see 
> log1.log).
> But, according to your prints only BAR5 which is registers BAR was
> updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 ->
> 0xfc100000)
> while I am interested to test BAR0 (Graphic RAM) move since this is
> where most of the complexity is. Is there a way to hack your code to 
> force this ?

Hi Andrey,

Regarding the amdgpu's BAR0 remaining on its place: it seems this is
because of fixed BARs starting from fc600000. The kernel tends to group
the BARs close to each other, making a bridge window as compact as
possible. So the BAR0 had occupied the closest "comfortable" slots
0xe0000000-0xefffffff, with the resulting bridge window of bus 00
covering all the BARs:

    pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff window]

I'll let you know if I get an idea how to rearrange that manually.

Two GPUs can actually swap their places.

What also can make a BAR movable -- is rmmod'ing its driver. It could
be some hack from within a tmux, like:

  rmmod igb; \
  rmmod xhci_hcd; \
  rmmod ahci; \
  echo 1 > /sys/bus/pci/rescan; \
  modprobe igb; \
  modprobe xhci_hcd; \
  modprobe ahci

I think pci_release_resource() should not be in
amdgpu_device_unmap_mmio() -- the patched kernel will do that itself
for BARs the amdgpu_device_bar_fixed() returns false. Even more -- the
kernel will ensure that all BARs which were working before, are
reassigned properly, so it needs them to be assigned before the
procedure.
The same for pci_assign_unassigned_bus_resources() in
amdgpu_device_remap_mmio(): this callback is invoked from
pci_rescan_bus() after pci_assign_unassigned_root_bus_resources().

> When testing with 2 graphic cards and triggering rescan, hard hang of
> the system happens during rescan_prepare of the second card  when 
> stopping the HW (see log2.log) - I don't understand why this would 
> happen as each of them passes fine when they are standalone tested
> and 
> there should be no interdependence between them as far as i know.
> Do you have any idea ?

What happens with two GPUs is unclear for me as well, nothing looks
suspicious.

Serge