Re: Question about supporting AMD eGPU hot plug case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Am 15.03.21 um 17:21 schrieb Andrey Grodzovsky:

On 2021-03-15 12:10 p.m., Christian König wrote:
Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky:


On 2021-03-12 4:03 a.m., Christian König wrote:
Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky:
[SNIP]
The expected result is they all move closer to the start of PCI address
space.


Ok, I updated as you described. Also I removed PCI conf command to stop address decoding and restart later as I noticed PCI core does it itself
when needed.
I tested now also with graphic desktop enabled while submitting
3d draw commands and seems like under this scenario everything still
works. Again, this all needs to be tested with VRAM BAR move as then
I believe I will see more issues like handling of MMIO mapped VRAM objects (like GART table). In case you do have an AMD card you could also maybe give it a try. In the meanwhile I will add support to ioremapping of those VRAM objects.

Andrey

Just an update, added support for unmaping/remapping of all VRAM
objects, both user space mmaped and kernel ioremaped. Seems to work
ok but again, without forcing VRAM BAR to move I can't be sure.
Alex, Chsristian - take a look when you have some time to give me some
initial feedback on the amdgpu side.

The code is at https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1

Mhm, that let's userspace busy retry until the BAR movement is done.

Not sure if that can't live lock somehow.

Christian.

In my testing it didn't but, I can instead route them to some
global static dummy page while BARs are moving and then when everything
done just invalidate the device address space again and let the
pagefaults fill in valid PFNs again.

Well that won't work because the reads/writes which are done in the meantime do need to wait for the BAR to be available again.

So waiting for the BAR move to finish is correct, but what we should do is to use a lock instead of an SRCU because that makes lockdep complain when we do something nasty.

Christian.


Spinlock I assume ? We can't sleep there - it's an interrupt.

Mhm, the BAR movement is in interrupt context?

Well that is rather bad. I was hoping to rename the GPU reset rw_sem into device_access rw_sem and then use the same lock for both (It's essentially the same problem).

But when we need to move the BAR in atomic/interrupt context that makes things a bit more complicated.

Christian.


Andrey




Andrey



Andrey






[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux