On Fri, Oct 13, 2017 at 10:01:04AM -0700, Dan Williams wrote: > On Fri, Oct 13, 2017 at 9:38 AM, Jason Gunthorpe > <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, Oct 13, 2017 at 08:14:55AM -0700, Dan Williams wrote: > > > >> scheme specific to RDMA which seems like a waste to me when we can > >> generically signal an event on the fd for any event that effects any > >> of the vma's on the file. The FL_LAYOUT lease impacts the entire file, > >> so as far as I can see delaying the notification until MR-init is too > >> late, too granular, and too RDMA specific. > > > > But for RDMA a FD is not what we care about - we want the MR handle so > > the app knows which MR needs fixing. > > I'd rather put the onus on userspace to remember where it used a > MAP_DIRECT mapping and be aware that all the mappings of that file are > subject to a lease break. Sure, we could build up a pile of kernel > infrastructure to notify on a per-MR basis, but I think that would > only be worth it if leases were range based. As it is, the entire file > is covered by a lease instance and all MRs that might reference that > file get one notification. That said, we can always arrange for a > per-driver callback at lease-break time so that it can do something > above and beyond the default notification. I don't think that really represents how lots of apps actually use RDMA. RDMA is often buried down in the software stack (eg in a MPI), and by the time a mapping gets used for RDMA transfer the link between the FD, mmap and the MR is totally opaque. Having a MR specific notification means the low level RDMA libraries have a chance to deal with everything for the app. Eg consider a HPC app using MPI that uses some DAX aware library to get DAX backed mmap's. It then passes memory in those mmaps to the MPI library to do transfers. The MPI creates the MR on demand. So, who should be responsible for MR coherency? Today we say the MPI is responsible. But we can't really expect the MPI to hook SIGIO and somehow try to reverse engineer what MRs are impacted from a FD that may not even still be open. I think, if you want to build a uAPI for notification of MR lease break, then you need show how it fits into the above software model: - How it can be hidden in a RDMA specific library - How lease break can be done hitlessly, so the library user never needs to know it is happening or see failed/missed transfers - Whatever fast path checking is needed does not kill performance Jason