Re: [RFC 0/7] Peer-direct memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/02/2016 16:44, Stephen Bates wrote:
> Sagi
> 
>> CC'ing sbates who played with this stuff at some point...
> 
> Thanks for inviting me to this party Sagi ;-). Here are some comments and responses based on our experiences. Apologies in advance for the list format:
> 
> 1. As it stands in 4.5-rc4 devm_memremap_pages will not work with iomem. Myself and  (mostly) Logan (cc'ed here) developed the ability to do that in an out of tree patch for memremap.c. We also developed a simple example driver for a PCIe device that exposes DRAM on the card via a BAR. We used this code to provide some feedback to Dan (e.g.  [1]-[3]). At this time we are preparing an RFC to extend devm_memremap_pages for IO memory and we hope to have that ready soon but there is no guarantee our approach is acceptable to the community. My hope is that it will be a good starting point for moving forward...
I'd be happy to see your RFC when you are ready. I see in the thread 
of [3] that you are using write-combining. Do you think your patchset 
will also be suitable for uncachable memory?

> 2. The two good things about Peer-Direct are that is works and it is here today. That said, I do think an approach based on ZONE_DEVICE is more general and a preferred way to allow IO devices to communicate with each other. The question is can we find such an approach that is acceptable to the community? As noted in point 1 I hope the coming RFC will initiate a discussion. I have also requested attendance at LSF/MM to discuss this topic (among others). 
> 
> 3. As of now the section alignment requirement is somewhat relaxed. I quote from [4]. 
> 
> "I could loosen the restriction a bit to allow one unaligned mapping
> per section.  However, if another mapping request came along that
> tried to map a free part of the section it would fail because the code
> depends on a  "1 dev_pagemap per section" relationship.  Seems an ok
> compromise to me..."
> 
> This is implemented in 4.5-rc4 (see memremap.c line 315).

I don't think that's enough for our purposes. We have devices with 
rather small BARs (32MB) and multiple PFs that all need to expose their 
BAR to peer to peer access. One can expect these PFs will be assigned 
adjacent addresses and they will break the "one dev_pagemap per 
section" rule.

> 4. The out of tree patch we did allows one to register the device memory as IO memory. However, we were only concerned with DRAM exposed on the BAR and so were not affected by the "i/o side effects" issues. Someone would need to think about how this applies to IOMEM that does have side-effects when accessed.
With this RFC, we map parts of the HCA BAR that were mmapped to a process 
(both uncacheable and write-combining) and map them to a peer device 
(another HCA). As long as the kernel doesn't do anything else with 
these pages, and leaves them to be controlled by the user-space 
application and/or the peer device, I don't see a problem with mapping
IO memory with side effects. However, I'm not an expert here, and I'd
be happy to hear what others think about this.

> 5. I concur with Sagi's comment below that one approach we can use to inform 3rd party device drives about vanishing memory regions is via mmu_notifiers. However this needs to be fleshed out and tied into the relevant driver(s).
> 
> 6. In full disclosure, my main interest in this ties in to NVM Express devices which can act as DMA masters and expose regions of IOMEM at the same time (via CMBs). I want to be able to tie these devices together with other IO devices (like RDMA NICs, FPGA and GPGPU based offload engines, other NVMe devices and storage adaptors) in a peer-2-peer fashion and may not always have a RDMA device in the mix...
I understand.

Regards,
Haggai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]