On Thu, Aug 01, 2024 at 10:53:45AM +0800, Huan Yang wrote: > > 在 2024/8/1 4:46, Daniel Vetter 写道: > > On Tue, Jul 30, 2024 at 08:04:04PM +0800, Huan Yang wrote: > > > 在 2024/7/30 17:05, Huan Yang 写道: > > > > 在 2024/7/30 16:56, Daniel Vetter 写道: > > > > > [????????? daniel.vetter@xxxxxxxx ????????? > > > > > https://aka.ms/LearnAboutSenderIdentification?????????????] > > > > > > > > > > On Tue, Jul 30, 2024 at 03:57:44PM +0800, Huan Yang wrote: > > > > > > UDMA-BUF step: > > > > > > 1. memfd_create > > > > > > 2. open file(buffer/direct) > > > > > > 3. udmabuf create > > > > > > 4. mmap memfd > > > > > > 5. read file into memfd vaddr > > > > > Yeah this is really slow and the worst way to do it. You absolutely want > > > > > to start _all_ the io before you start creating the dma-buf, ideally > > > > > with > > > > > everything running in parallel. But just starting the direct I/O with > > > > > async and then creating the umdabuf should be a lot faster and avoid > > > > That's greate, Let me rephrase that, and please correct me if I'm wrong. > > > > > > > > UDMA-BUF step: > > > > 1. memfd_create > > > > 2. mmap memfd > > > > 3. open file(buffer/direct) > > > > 4. start thread to async read > > > > 3. udmabuf create > > > > > > > > With this, can improve > > > I just test with it. Step is: > > > > > > UDMA-BUF step: > > > 1. memfd_create > > > 2. mmap memfd > > > 3. open file(buffer/direct) > > > 4. start thread to async read > > > 5. udmabuf create > > > > > > 6 . join wait > > > > > > 3G file read all step cost 1,527,103,431ns, it's greate. > > Ok that's almost the throughput of your patch set, which I think is close > > enough. The remaining difference is probably just the mmap overhead, not > > sure whether/how we can do direct i/o to an fd directly ... in principle > > it's possible for any file that uses the standard pagecache. > > Yes, for mmap, IMO, now that we get all folios and pin it. That's mean all > pfn it's got when udmabuf created. > > So, I think mmap with page fault is helpless for save memory but increase > the mmap access cost.(maybe can save a little page table's memory) > > I want to offer a patchset to remove it and more suitable for folios > operate(And remove unpin list). And contains some fix patch. > > I'll send it when I test it's good. > > > About fd operation for direct I/O, maybe use sendfile or copy_file_range? > > sendfile base pipe buffer, it's low performance when I test is. > > copy_file_range can't work due to it's not the same file system. > > So, I can't find other way to do it. Can someone give some suggestions? Yeah direct I/O to pagecache without an mmap might be too niche to be supported. Maybe io_uring has something, but I guess as unlikely as anything else. -Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch