On Thu, Apr 25, 2019 at 09:38:14PM -0400, Jerome Glisse wrote: > I see that they are still empty spot in LSF/MM schedule so i would like to > have a discussion on allowing direct block mapping of file for devices (nic, > gpu, fpga, ...). This is mm, fs and block discussion, thought the mm side > is pretty light ie only adding 2 callback to vm_operations_struct: The filesystem already has infrastructure for the bits it needs to provide. They are called file layout leases (how many times do I have to keep telling people this!), and what you do with the lease for the LBA range the filesystem maps for you is then something you can negotiate with the underlying block device. i.e. go look at how xfs_pnfs.c works to hand out block mappings to remote pNFS clients so they can directly access the underlying storage. Basically, anyone wanting to map blocks needs a file layout lease and then to manage the filesystem state over that range via these methods in the struct export_operations: int (*get_uuid)(struct super_block *sb, u8 *buf, u32 *len, u64 *offset); int (*map_blocks)(struct inode *inode, loff_t offset, u64 len, struct iomap *iomap, bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); Basically, before you read/write data, you map the blocks. if you've written data, then you need to commit the blocks (i.e. tell the fs they've been written to). The iomap will give you a contiguous LBA range and the block device they belong to, and you can then use that to whatever smart DMA stuff you need to do through the block device directly. If the filesystem wants the space back (e.g. because truncate) then the lease will be revoked. The client then must finish off it's outstanding operations, commit them and release the lease. To access the file range again, it must renew the lease and remap the file through ->map_blocks.... > So i would like to gather people feedback on general approach and few things > like: > - Do block device need to be able to invalidate such mapping too ? > > It is easy for fs the to invalidate as it can walk file mappings > but block device do not know about file. If you are needing the block device to invalidate filesystem level information, then your model is all wrong. > - Do we want to provide some generic implementation to share accross > fs ? We already have a generic interface, filesystems other than XFS will need to implement them. > - Maybe some share helpers for block devices that could track file > corresponding to peer mapping ? If the application hasn't supplied the peer with the file it needs to access, get a lease from and then map an LBA range out of, then you are doing it all wrong. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx