On 2021-03-05 3:36 a.m., Christian König wrote:
Hi Andrey,
Am 03.03.21 um 17:05 schrieb Andrey Grodzovsky:
Ping
Andrey
On 2021-03-02 12:48 p.m., Andrey Grodzovsky wrote:
Adding PCI mailing list per Bjorn's advise (sorry, skipped the
address last time).
On 2021-03-02 12:22 p.m., Christian König wrote:
Hi Andrey,
Can you elaborate more why you need such a lock on a global level
and not
enough per driver (per device more specifically) local lock
acquired in
rescan_preapare (before BARs shuffling starts) and released in
rescan_done
(after BARs shuffling done) ?
because this adds inter device dependencies.
E.g. when we have device A, B and C and each is taking it's own lock
we end up with 3 locks.
If those device have inter dependencies on their own, for example
because of DMA-buf we will end up in quite a locking mess.
Apart from that if we don't always keep the order A, B, C but
instead end up with A, C, B lockdep will certainly start to complain.
Regards,
Christian.
But as far as I understand those locks are agnostic to each other -
each lock is taken only by it's respective device and not by others.
Well they should be agnostic to each other, but are they really? I mean
we could have inter device lock dependencies because of DMA-buf or good
knows what.
So I don't see deadlock danger here but, I do suspect lockdep might
throw a false warning...
Yeah and disabling or ignoring the lockdep warning could cause problems
because we don't see issues caused by inter device lock dependencies any
more.
I just have the feeling that this would cause a lot of problems.
Christian.
Sergei, can you please address Christian's suggestion bellow - quote
"
What we need instead is a upper level PCI layer rw-lock which we
can take on the read side in the driver
"
Andrey
Andrey
Am 02.03.21 um 18:16 schrieb Andrey Grodzovsky:
Per Bjorn's advise adding PCI mailing list.
Christian, please reply on top of this mail and not on top the
earlier, original one.
Andrey
On 2021-03-02 11:49 a.m., Bjorn Helgaas wrote:
Please add linux-pci@xxxxxxxxxxxxxxx to the cc list. There's a
lot of
useful context in this thread and it needs to be archived.
On Tue, Mar 02, 2021 at 11:37:25AM -0500, Andrey Grodzovsky wrote:
Adding Sergei and Bjorn since you are suggesting some PCI related
changes.
On 2021-03-02 6:28 a.m., Christian König wrote:
Yeah, I'm thinking about that as well for quite a while now.
I'm not 100% sure what Sergei does in his patch set will work
for us. He
basically tells the driver to stop any processing then shuffle
around
the BARs and then tells him to start again.
Because of the necessity to reprogram bridges there isn't much
other way
how to do this, but we would need something like taking a lock,
returning and dropping the lock later on. If you do this for
multiple
drivers at the same time this results in a locking nightmare.
What we need instead is a upper level PCI layer rw-lock which we
can
take on the read side in the driver, similar to our internal
reset lock.
Can you elaborate more why you need such a lock on a global level
and not
enough per driver (per device more specifically) local lock
acquired in
rescan_preapare (before BARs shuffling starts) and released in
rescan_done
(after BARs shuffling done) ?
Andrey
Wiring up changing the mapping is the smaller problem in that
picture.
Christian.
Am 01.03.21 um 22:14 schrieb Andrey Grodzovsky:
Hi, I started looking into actual implementation for this, as
reference point I am using Sege's NVME implementations (see
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FYADRO-KNS%2Flinux%2Fcommit%2F58c375df086502538706ee23e8ef7c8bb5a4178f&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C44bc68a68feb4b71836c08d8dd9b306a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637503005898302093%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nuTptAjGewG2T7tXOQr2h%2Bdr42lyg3m1%2Bfv0mJzd8tw%3D&reserved=0)
and our own amdgpu_device_resize_fb_bar. Our use case being more
complicated then NVME's arises some questions or thoughts:
Before the PCI core will move the BARs (VRAM and registers) we
have
to stop the HW and SW and then unmap all MMIO mapped entitles
in the
driver. This includes, registers of all types and VRAM placed BOs
with CPU visibility. My concern is how to prevent accesses to all
MMIO ranges between iounmap in rescan_preapare and ioremap in
rescan_done. For register accesses we are again back to same issue
we had with device unplug and GPU reset we are discussing with
Denis
now. So here at least as first solution just to confirm the
feature
works I can introduce low level (in register accessors) RW locking
to avoid access until all register access driver pointers are
remapped post BARs move. What more concerns me is how to - In
rescan_preapare: collect all CPU accessible BOs placed in VRAM (i
can use amdgpu specific list like I currently use for device
unplug), lock them to prevent any further use of them, unpin them
(and unmap). In rescan_preapare: remap them back and unlock
them for
further use. Also, there is a difference between kernel BOs where
indeed iounmap, ioremap should be enough and user space mapped BOs
where it seems like in hot unplug, we need in rescan_preapare to
only unamp them, in between drop new page faults with
VM_FAULT_RETRY
and post rescan_done allow page faults to go through to refill all
the user page tables with updated MMIO addresses.
Let me know your thoughts.
Andrey
On 2021-02-26 4:03 p.m., Andrey Grodzovsky wrote:
On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote:
On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote:
On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote:
On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote:
On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote:
...
Are you saying that even without hot-plugging, while both
nvme
and
AMD
card are present
right from boot, you still get BARs moving and MMIO ranges
reassigned
for NVME BARs
just because amdgpu driver will start resize of AMD card
BARs and
this
will trigger NVMEs BARs move to
allow AMD card BARs to cover full range of VIDEO RAM ?
Yes. Unconditionally, because it is unknown beforehand if
NVMe's
BAR
movement will help. In this particular case BAR movement is
not
needed,
but is done anyway.
BARs are not moved one by one, but the kernel releases all the
releasable ones, and then recalculates a new BAR layout to
fit them
all. Kernel's algorithm is different from BIOS's, so NVME has
appeared
at a new place.
This is triggered by following:
- at boot, if BIOS had assigned not every BAR;
- during pci_resize_resource();
- during pci_rescan_bus() -- after a pciehp event or a
manual via
sysfs.
By manual via sysfs you mean something like this - 'echo 1 >
/sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 >
/sys/bus/pci/rescan ' ? I am looking into how most reliably
trigger
PCI
code to call my callbacks even without having external PCI
cage for
GPU
(will take me some time to get it).
Yeah, this is our way to go when a device can't be physically
removed
or unpowered remotely. With just a bit shorter path:
sudo sh -c 'echo 1 >
/sys/bus/pci/devices/0000\:0c\:00.0/remove'
sudo sh -c 'echo 1 > /sys/bus/pci/rescan'
Or, just a second command (rescan) is enough: a BAR movement
attempt
will be triggered even if there were no changes in PCI topology.
Serge
Ok, able to trigger stub callbacks call after doing rescan from
sysfs. Also I see BARs movement
[ 1860.968036] amdgpu 0000:09:00.0: BAR 0: assigned [mem
0xc0000000-0xcfffffff 64bit pref]
[ 1860.968050] amdgpu 0000:09:00.0: BAR 0 updated: 0xe0000000 ->
0xc0000000
[ 1860.968054] amdgpu 0000:09:00.0: BAR 2: assigned [mem
0xd0000000-0xd01fffff 64bit pref]
[ 1860.968068] amdgpu 0000:09:00.0: BAR 2 updated: 0xf0000000 ->
0xd0000000
[ 1860.968071] amdgpu 0000:09:00.0: BAR 5: assigned [mem
0xf0100000-0xf013ffff]
[ 1860.968078] amdgpu 0000:09:00.0: BAR 5 updated: 0xfce00000 ->
0xf0100000
As expected, after the move the device becomes unresponsive as
unmap/ioremap wasn't done.
Andrey