Re: [LSF/MM TOPIC] Direct block mapping through fs for device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2019-04-25 at 21:38 -0400, Jerome Glisse wrote:
> I see that they are still empty spot in LSF/MM schedule so i would
> like to
> have a discussion on allowing direct block mapping of file for
> devices (nic,
> gpu, fpga, ...). This is mm, fs and block discussion, thought the mm
> side
> is pretty light ie only adding 2 callback to vm_operations_struct:
> 
>     int (*device_map)(struct vm_area_struct *vma,
>                       struct device *importer,
>                       struct dma_buf **bufp,
>                       unsigned long start,
>                       unsigned long end,
>                       unsigned flags,
>                       dma_addr_t *pa);
> 
>     // Some flags i can think of:
>     DEVICE_MAP_FLAG_PIN // ie return a dma_buf object
>     DEVICE_MAP_FLAG_WRITE // importer want to be able to write
>     DEVICE_MAP_FLAG_SUPPORT_ATOMIC_OP // importer want to do atomic
> operation
>                                       // on the mapping
> 
>     void (*device_unmap)(struct vm_area_struct *vma,
>                          struct device *importer,
>                          unsigned long start,
>                          unsigned long end,
>                          dma_addr_t *pa);
> 
> Each filesystem could add this callback and decide wether or not to
> allow
> the importer to directly map block. Filesystem can use what ever
> logic they
> want to make that decision. For instance if they are page in the page
> cache
> for the range then it can say no and the device would fallback to
> main
> memory. Filesystem can also update its internal data structure to
> keep
> track of direct block mapping.
> 
> If filesystem decide to allow the direct block mapping then it
> forward the
> request to the block device which itself can decide to forbid the
> direct
> mapping again for any reasons. For instance running out of BAR space
> or
> peer to peer between block device and importer device is not
> supported or
> block device does not want to allow writeable peer mapping ...
> 
> 
> So event flow is:
>     1  program mmap a file (end never intend to access it with CPU)
>     2  program try to access the mmap from a device A
>     3  device A driver see device_map callback on the vma and call it
>     4a on success device A driver program the device to mapped dma
> address
>     4b on failure device A driver fallback to faulting so that it can
> use
>        page from page cache
> 
> This API assume that the importer does support mmu notifier and thus
> that
> the fs can invalidate device mapping at _any_ time by sending mmu
> notifier
> to all mapping of the file (for a given range in the file or for the
> whole
> file). Obviously you want to minimize disruption and thus only
> invalidate
> when necessary.
> 
> The dma_buf parameter can be use to add pinning support for
> filesystem who
> wish to support that case too. Here the mapping lifetime get
> disconnected
> from the vma and is transfer to the dma_buf allocated by filesystem.
> Again
> filesystem can decide to say no as pinning blocks has drastic
> consequence
> for filesystem and block device.
> 
> 
> This has some similarities to the hmmap and caching topic (which is
> mapping
> block directly to CPU AFAIU) but device mapping can cut some corner
> for
> instance some device can forgo atomic operation on such mapping and
> thus
> can work over PCIE while CPU can not do atomic to PCIE BAR.
> 
> Also this API here can be use to allow peer to peer access between
> devices
> when the vma is a mmap of a device file and thus vm_operations_struct
> come
> from some exporter device driver. So same 2 vm_operations_struct call
> back
> can be use in more cases than what i just described here.
> 
> 
> So i would like to gather people feedback on general approach and few
> things
> like:
>     - Do block device need to be able to invalidate such mapping too
> ?
> 
>       It is easy for fs the to invalidate as it can walk file
> mappings
>       but block device do not know about file.
> 
>     - Do we want to provide some generic implementation to share
> accross
>       fs ?
> 
>     - Maybe some share helpers for block devices that could track
> file
>       corresponding to peer mapping ?

I'm interested in being a part of this discussion.

> 
> 
> Cheers,
> Jérôme




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux