On Tue, Jul 16, 2024 at 02:07:20PM +0200, Christian König wrote: > Am 16.07.24 um 11:31 schrieb Daniel Vetter: > > On Tue, Jul 16, 2024 at 10:48:40AM +0800, Huan Yang wrote: > > > I just research the udmabuf, Please correct me if I'm wrong. > > > > > > 在 2024/7/15 20:32, Christian König 写道: > > > > Am 15.07.24 um 11:11 schrieb Daniel Vetter: > > > > > On Thu, Jul 11, 2024 at 11:00:02AM +0200, Christian König wrote: > > > > > > Am 11.07.24 um 09:42 schrieb Huan Yang: > > > > > > > Some user may need load file into dma-buf, current > > > > > > > way is: > > > > > > > 1. allocate a dma-buf, get dma-buf fd > > > > > > > 2. mmap dma-buf fd into vaddr > > > > > > > 3. read(file_fd, vaddr, fsz) > > > > > > > This is too heavy if fsz reached to GB. > > > > > > You need to describe a bit more why that is to heavy. I can only > > > > > > assume you > > > > > > need to save memory bandwidth and avoid the extra copy with the CPU. > > > > > > > > > > > > > This patch implement a feature called DMA_HEAP_IOCTL_ALLOC_READ_FILE. > > > > > > > User need to offer a file_fd which you want to load into > > > > > > > dma-buf, then, > > > > > > > it promise if you got a dma-buf fd, it will contains the file content. > > > > > > Interesting idea, that has at least more potential than trying > > > > > > to enable > > > > > > direct I/O on mmap()ed DMA-bufs. > > > > > > > > > > > > The approach with the new IOCTL might not work because it is a very > > > > > > specialized use case. > > > > > > > > > > > > But IIRC there was a copy_file_range callback in the file_operations > > > > > > structure you could use for that. I'm just not sure when and how > > > > > > that's used > > > > > > with the copy_file_range() system call. > > > > > I'm not sure any of those help, because internally they're all still > > > > > based > > > > > on struct page (or maybe in the future on folios). And that's the thing > > > > > dma-buf can't give you, at least without peaking behind the curtain. > > > > > > > > > > I think an entirely different option would be malloc+udmabuf. That > > > > > essentially handles the impendence-mismatch between direct I/O and > > > > > dma-buf > > > > > on the dma-buf side. The downside is that it'll make the permanently > > > > > pinned memory accounting and tracking issues even more apparent, but I > > > > > guess eventually we do need to sort that one out. > > > > Oh, very good idea! > > > > Just one minor correction: it's not malloc+udmabuf, but rather > > > > create_memfd()+udmabuf. > > Hm right, it's create_memfd() + mmap(memfd) + udmabuf > > > > > > And you need to complete your direct I/O before creating the udmabuf > > > > since that reference will prevent direct I/O from working. > > > udmabuf will pin all pages, so, if returned fd, can't trigger direct I/O > > > (same as dmabuf). So, must complete read before pin it. > > Why does pinning prevent direct I/O? I haven't tested, but I'd expect the > > rdma folks would be really annoyed if that's the case ... > > Pinning (or rather taking another page reference) prevents writes from using > direct I/O because writes try to find all references and make them read only > so that nobody modifies the content while the write is done. Where do you see that happen? That's counter to my understading of what pin_user_page() does, which is what direct I/O uses ... > As far as I know the same approach is used for NUMA migration and replacing > small pages with big ones in THP. But for the read case here it should still > work. Yeah elevated refcount breaks migration, but that's entirely different from the direct I/O use-case. Count me somewhat confused. -Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch