Re: Understanding P2P DMA related errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2022-10-04 17:23, Ramesh Errabolu wrote:
> Could I request some help in understanding some PCIe P2P related errors.
> 
>     [72.896624] amdgpu 0000:67:00.0: cannot be used for peer-to-peer DMA
>     as the client and provider (0000:19:00.0) do not share an upstream
>     bridge or whitelisted host bridge
> 
> 
> *System Information*:
> 
>   * The kernel is tagged as 5.14.21
>   * The last entry in the whitelist is   {PCI_VENDOR_ID_INTEL, 0x2030 -
>     31, 32, 33, 20,  0}
>       o p2pdma.c - LINK
>         <https://elixir.bootlin.com/linux/v5.14.21/source/drivers/pci/p2pdma.c>
>   * Output of PCIe device on the system that might reference root
>     complex is:
>       o fe:00.3 Host bridge [0600]: Intel Corporation Device [8086:0998]
>       o Could you confirm if the command I ran is correct. I am not sure
>       o *sudo lspci -nn | grep  -C 1 -i host*
>       o If above command is not correct, how can I get root complex
>         device's id correctly
> 
> I tried to reason if the two AMD devices are connected to two different
> root complex devices. Looking at the PCIe device tree, I don't see that
> to be the case. Perhaps I am not interpreting the PCIe device tree
> correctly. Including below a short fragment:
> 
> 
>     +-[0000:e2]-+-00.0  Intel Corporation Device 09a2
>      |           +-00.1  Intel Corporation Device 09a4
>      |           +-00.2  Intel Corporation Device 09a3
>      |           +-00.4  Intel Corporation Device 0998
>      |          * \-02.0-[e3-e5]*----00.0-[e4-e5]----00.0-[e5]----00.0
>      Advanced Micro Devices, Inc. [AMD/ATI]
> 
>     I am reading this as follows:
> 
>       o Device E2:02.0, a Intel PCI bridge is connected to Domain 0000
>       o Device E3:00.0, a PCI bridge from AMD is connected to Intel PCI
>         bridge device E2:02.
>       o Device E4:00.0, a PCI bridge from AMD is connected to AMD PCI
>         bridge device E3:00.0
>       o Device E5:00.0, a Display controller is connected to AMD PCI
>         bridge E4:00.0
> 
> Per my reading, in the above tree devices *E2:02.0* (*8086:347A*)
> and *E2:00.4* (*8086:09A2*) are not connected to each other directly.
> More importantly they should be considered as PEERs / SIBLINGs.
> Downstream from E2:02.0 is the AMD device E5:00.0 (*1002:740F*). In this
> reading AMD device is not connected to the root complex device. A
> similar pattern is seen with regards to other AMD devices. Basically all
> of the AMD devices connect to the domain (*0000*) via different buses.
> Importantly in their connection to the domain there is no root complex
> device. *Is my reading WRONG*? What is also not clear is how adding the
> device *8086:09A2* to the whitelist helps as the packets do not go
> through that device?
> 

Hmm, looks like a really new Ice-Lake system. Doesn't even have proper
PCI database entries yet. The topology seems a bit unusual, but those
have been getting ever stranger with each new generation.

09a2 looks like the host bridge device id. I'd probably try adding that
to the white list and see what happens.

Logan





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux