Hi, This is summary of 4th RDMA microconference co-located with Linux Plumbers Conference 2019. We would like to thank you for all presenters and attendees of our RDMA track, it is you who made this event so successful. Special thanks goes to Doug Ledford who volunteered to take notes and Jason Gunthorpe who helped to run this event smoothly. The original etherpad is located at [2] and below you will find the copy of those notes: ------------------------------------------------------------------------------------------------ 1. GUP and ZONE_DEVICE pages. [3] Jason Gunthorpe, John Hubbard and Don Dutile * Make the interface to use p2p mechanism be via sysfs. (PCI???). * Try to kill PTE flag for dev memory to make it easier to support on things like s390. * s390 will have mapping issues, arm/x86/PowerPC should be fine. * Looking to map partial BARs so they can be partitioned between different users. * Total BAR space could exceed 1TB in some scenarios (lots of GPUs in an HPC machine with persistent memory, etc.). * Initially use struct page element but try to remove it later. * Unlikely to be able to remove struct page, so maybe make it less painful by doing something like forcing all zone mappings to use hugepages ioctl no, sysfs yes. * PCI SIG talking about peer-2-peer too. * Distance might not be the best function name for the pci p2p checking function. * Conceptually, looking for new type of page fault, DMA fault, that will make a page visible to DMA even if we don’t care if it’s visible to the CPU GUP API makes really weak promise, no one could possibly think that it’s that weak, so everyone assumed it was stronger they were wrong. * It really is that weak wrappers around the GUP flags? 17+ flags currently, combinational matrix is extreme, some internal only flags can be abused by callers. * Possible to set "opposite" GUP flags. * Most (if not all) out of core code (drivers) get_user_pages users need same flags. 2. RDMA, File Systems, and DAX. [4] Ira Weiny * There was a bug in previous versions of patch set. It’s fixed. * New file_pin object to track relationship between mmaped files and DMA mappings to the underlying pages. * If owners of lease tries to do something that requires changes to the file layout: deadlock of application (current patch set, but not settled). * Write lease/fallocate/downgrade to read/unbreakable lease - fix race issue with fallocate and lease chicken and egg problem. 3. Discussion about IBNBD/IBTRS, upstreaming and action items. [5] Jinpu Wang, Danil Kipnis * IBTRS is standalone transfer engine that can be used with any ULP. * IBTRS only uses RDMA_WRITE with IMM and so is limited to fabrics that support this. * Server does not unmap after write from client so data can change when the server is flushing to disk. * Need to think about transfer model as the current one appears to be vulnerable to a nefarious kernel module. * It is worth to consider to unite 4 kernel modules to be 2 kernel * modules. One responsible for transfer (server + client) and another is responsible for block operations. * Security concern should be cleared first before in-depth review. * No objections to see IBTRS in kernel, but needs to be renamed to something more general, because it works on many fabrics and not only IB. 5. Improving RDMA performance through the use of contiguous memory and larger pages for files. [6] Christopher Lameter * The main problem is that contiguous physical memory being limited resource in real life systems. The difference in system performance so visible that it is worth to reboot servers every couple of days (depend on workload). * The reason to it, existence of unmovable pages. * HugePages help, but pinned objects over time end up breaking up the huge pages and eventually system flows down Need movable objects: dentry and inode are the big culprits. * Typical use case used to trigger degradation is copying both very large and very small files on the same machine. * Attempts to allocate unmovable pages in specific place causes to situations where system experiences OOM despite being enough memory. * x86 has 4K page size, while PowerPC has 64K. The bigger page size gives better performance, but wastes more memory for small objects. 4. Shared IB objects. [7] Yuval Shaia * There was lively discussion between various models of sharing objects, through file description, or uverbs context, or PD. * People would like to stick to the file handle model so you share the file handle and get everything you need as being simplest approach. * Is the security model resolved? Right now, the model assumes trusted processes are allowed to share only. * Simple (FD) model creates challenge to properly release HW objects after main process exits and leaves HW objects which were in use by itself and not by shared processes. * Refcount needs to be in the API to track when the shared object is freeable * API requires shared memory first, then import PD and import MR. This model (as opposed to sharing the fd of the in context), allows for safe cleanup on process death without interfering with other users of the shared PD/MR. Thanks [1] https://linuxplumbersconf.org/event/4/sessions/64/#20190911 [2] https://etherpad.net/p/LPC2019_RDMA [3] https://www.linuxplumbersconf.org/event/4/contributions/369/ [4] https://linuxplumbersconf.org/event/4/contributions/368/ [5] https://linuxplumbersconf.org/event/4/contributions/367/ [6] https://linuxplumbersconf.org/event/4/contributions/371/ [7] https://www.linuxplumbersconf.org/event/4/contributions/371/