Adding PCI mailing list per Bjorn's advise (sorry, skipped the address
last time).
On 2021-03-02 12:22 p.m., Christian König wrote:
Hi Andrey,
Can you elaborate more why you need such a lock on a global level and not
enough per driver (per device more specifically) local lock acquired in
rescan_preapare (before BARs shuffling starts) and released in
rescan_done
(after BARs shuffling done) ?
because this adds inter device dependencies.
E.g. when we have device A, B and C and each is taking it's own lock we
end up with 3 locks.
If those device have inter dependencies on their own, for example
because of DMA-buf we will end up in quite a locking mess.
Apart from that if we don't always keep the order A, B, C but instead
end up with A, C, B lockdep will certainly start to complain.
Regards,
Christian.
But as far as I understand those locks are agnostic to each other - each
lock is taken only by it's respective device and not by others. So I
don't see deadlock danger here but, I do suspect lockdep might throw a
false warning...
Andrey
Am 02.03.21 um 18:16 schrieb Andrey Grodzovsky:
Per Bjorn's advise adding PCI mailing list.
Christian, please reply on top of this mail and not on top the
earlier, original one.
Andrey
On 2021-03-02 11:49 a.m., Bjorn Helgaas wrote:
Please add linux-pci@xxxxxxxxxxxxxxx to the cc list. There's a lot of
useful context in this thread and it needs to be archived.
On Tue, Mar 02, 2021 at 11:37:25AM -0500, Andrey Grodzovsky wrote:
Adding Sergei and Bjorn since you are suggesting some PCI related
changes.
On 2021-03-02 6:28 a.m., Christian König wrote:
Yeah, I'm thinking about that as well for quite a while now.
I'm not 100% sure what Sergei does in his patch set will work for
us. He
basically tells the driver to stop any processing then shuffle around
the BARs and then tells him to start again.
Because of the necessity to reprogram bridges there isn't much
other way
how to do this, but we would need something like taking a lock,
returning and dropping the lock later on. If you do this for multiple
drivers at the same time this results in a locking nightmare.
What we need instead is a upper level PCI layer rw-lock which we can
take on the read side in the driver, similar to our internal reset
lock.
Can you elaborate more why you need such a lock on a global level
and not
enough per driver (per device more specifically) local lock acquired in
rescan_preapare (before BARs shuffling starts) and released in
rescan_done
(after BARs shuffling done) ?
Andrey
Wiring up changing the mapping is the smaller problem in that picture.
Christian.
Am 01.03.21 um 22:14 schrieb Andrey Grodzovsky:
Hi, I started looking into actual implementation for this, as
reference point I am using Sege's NVME implementations (see
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FYADRO-KNS%2Flinux%2Fcommit%2F58c375df086502538706ee23e8ef7c8bb5a4178f&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C44bc68a68feb4b71836c08d8dd9b306a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637503005898302093%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nuTptAjGewG2T7tXOQr2h%2Bdr42lyg3m1%2Bfv0mJzd8tw%3D&reserved=0)
and our own amdgpu_device_resize_fb_bar. Our use case being more
complicated then NVME's arises some questions or thoughts:
Before the PCI core will move the BARs (VRAM and registers) we have
to stop the HW and SW and then unmap all MMIO mapped entitles in the
driver. This includes, registers of all types and VRAM placed BOs
with CPU visibility. My concern is how to prevent accesses to all
MMIO ranges between iounmap in rescan_preapare and ioremap in
rescan_done. For register accesses we are again back to same issue
we had with device unplug and GPU reset we are discussing with Denis
now. So here at least as first solution just to confirm the feature
works I can introduce low level (in register accessors) RW locking
to avoid access until all register access driver pointers are
remapped post BARs move. What more concerns me is how to - In
rescan_preapare: collect all CPU accessible BOs placed in VRAM (i
can use amdgpu specific list like I currently use for device
unplug), lock them to prevent any further use of them, unpin them
(and unmap). In rescan_preapare: remap them back and unlock them for
further use. Also, there is a difference between kernel BOs where
indeed iounmap, ioremap should be enough and user space mapped BOs
where it seems like in hot unplug, we need in rescan_preapare to
only unamp them, in between drop new page faults with VM_FAULT_RETRY
and post rescan_done allow page faults to go through to refill all
the user page tables with updated MMIO addresses.
Let me know your thoughts.
Andrey
On 2021-02-26 4:03 p.m., Andrey Grodzovsky wrote:
On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote:
On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote:
On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote:
On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote:
On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote:
...
Are you saying that even without hot-plugging, while both nvme
and
AMD
card are present
right from boot, you still get BARs moving and MMIO ranges
reassigned
for NVME BARs
just because amdgpu driver will start resize of AMD card BARs
and
this
will trigger NVMEs BARs move to
allow AMD card BARs to cover full range of VIDEO RAM ?
Yes. Unconditionally, because it is unknown beforehand if NVMe's
BAR
movement will help. In this particular case BAR movement is not
needed,
but is done anyway.
BARs are not moved one by one, but the kernel releases all the
releasable ones, and then recalculates a new BAR layout to fit
them
all. Kernel's algorithm is different from BIOS's, so NVME has
appeared
at a new place.
This is triggered by following:
- at boot, if BIOS had assigned not every BAR;
- during pci_resize_resource();
- during pci_rescan_bus() -- after a pciehp event or a manual via
sysfs.
By manual via sysfs you mean something like this - 'echo 1 >
/sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 >
/sys/bus/pci/rescan ' ? I am looking into how most reliably
trigger
PCI
code to call my callbacks even without having external PCI cage
for
GPU
(will take me some time to get it).
Yeah, this is our way to go when a device can't be physically
removed
or unpowered remotely. With just a bit shorter path:
sudo sh -c 'echo 1 >
/sys/bus/pci/devices/0000\:0c\:00.0/remove'
sudo sh -c 'echo 1 > /sys/bus/pci/rescan'
Or, just a second command (rescan) is enough: a BAR movement
attempt
will be triggered even if there were no changes in PCI topology.
Serge
Ok, able to trigger stub callbacks call after doing rescan from
sysfs. Also I see BARs movement
[ 1860.968036] amdgpu 0000:09:00.0: BAR 0: assigned [mem
0xc0000000-0xcfffffff 64bit pref]
[ 1860.968050] amdgpu 0000:09:00.0: BAR 0 updated: 0xe0000000 ->
0xc0000000
[ 1860.968054] amdgpu 0000:09:00.0: BAR 2: assigned [mem
0xd0000000-0xd01fffff 64bit pref]
[ 1860.968068] amdgpu 0000:09:00.0: BAR 2 updated: 0xf0000000 ->
0xd0000000
[ 1860.968071] amdgpu 0000:09:00.0: BAR 5: assigned [mem
0xf0100000-0xf013ffff]
[ 1860.968078] amdgpu 0000:09:00.0: BAR 5 updated: 0xfce00000 ->
0xf0100000
As expected, after the move the device becomes unresponsive as
unmap/ioremap wasn't done.
Andrey