On Thu, 2019-04-25 at 21:38 -0400, Jerome Glisse wrote: > I see that they are still empty spot in LSF/MM schedule so i would > like to > have a discussion on allowing direct block mapping of file for > devices (nic, > gpu, fpga, ...). This is mm, fs and block discussion, thought the mm > side > is pretty light ie only adding 2 callback to vm_operations_struct: > > int (*device_map)(struct vm_area_struct *vma, > struct device *importer, > struct dma_buf **bufp, > unsigned long start, > unsigned long end, > unsigned flags, > dma_addr_t *pa); > > // Some flags i can think of: > DEVICE_MAP_FLAG_PIN // ie return a dma_buf object > DEVICE_MAP_FLAG_WRITE // importer want to be able to write > DEVICE_MAP_FLAG_SUPPORT_ATOMIC_OP // importer want to do atomic > operation > // on the mapping > > void (*device_unmap)(struct vm_area_struct *vma, > struct device *importer, > unsigned long start, > unsigned long end, > dma_addr_t *pa); > > Each filesystem could add this callback and decide wether or not to > allow > the importer to directly map block. Filesystem can use what ever > logic they > want to make that decision. For instance if they are page in the page > cache > for the range then it can say no and the device would fallback to > main > memory. Filesystem can also update its internal data structure to > keep > track of direct block mapping. > > If filesystem decide to allow the direct block mapping then it > forward the > request to the block device which itself can decide to forbid the > direct > mapping again for any reasons. For instance running out of BAR space > or > peer to peer between block device and importer device is not > supported or > block device does not want to allow writeable peer mapping ... > > > So event flow is: > 1 program mmap a file (end never intend to access it with CPU) > 2 program try to access the mmap from a device A > 3 device A driver see device_map callback on the vma and call it > 4a on success device A driver program the device to mapped dma > address > 4b on failure device A driver fallback to faulting so that it can > use > page from page cache > > This API assume that the importer does support mmu notifier and thus > that > the fs can invalidate device mapping at _any_ time by sending mmu > notifier > to all mapping of the file (for a given range in the file or for the > whole > file). Obviously you want to minimize disruption and thus only > invalidate > when necessary. > > The dma_buf parameter can be use to add pinning support for > filesystem who > wish to support that case too. Here the mapping lifetime get > disconnected > from the vma and is transfer to the dma_buf allocated by filesystem. > Again > filesystem can decide to say no as pinning blocks has drastic > consequence > for filesystem and block device. > > > This has some similarities to the hmmap and caching topic (which is > mapping > block directly to CPU AFAIU) but device mapping can cut some corner > for > instance some device can forgo atomic operation on such mapping and > thus > can work over PCIE while CPU can not do atomic to PCIE BAR. > > Also this API here can be use to allow peer to peer access between > devices > when the vma is a mmap of a device file and thus vm_operations_struct > come > from some exporter device driver. So same 2 vm_operations_struct call > back > can be use in more cases than what i just described here. > > > So i would like to gather people feedback on general approach and few > things > like: > - Do block device need to be able to invalidate such mapping too > ? > > It is easy for fs the to invalidate as it can walk file > mappings > but block device do not know about file. > > - Do we want to provide some generic implementation to share > accross > fs ? > > - Maybe some share helpers for block devices that could track > file > corresponding to peer mapping ? I'm interested in being a part of this discussion. > > > Cheers, > Jérôme