Re: [PATCH v2 0/5] Introduce DMA_HEAP_ALLOC_AND_READ_FILE heap flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




在 2024/8/1 4:46, Daniel Vetter 写道:
On Tue, Jul 30, 2024 at 08:04:04PM +0800, Huan Yang wrote:
在 2024/7/30 17:05, Huan Yang 写道:
在 2024/7/30 16:56, Daniel Vetter 写道:
[????????? daniel.vetter@xxxxxxxx ?????????
https://aka.ms/LearnAboutSenderIdentification?????????????]

On Tue, Jul 30, 2024 at 03:57:44PM +0800, Huan Yang wrote:
UDMA-BUF step:
    1. memfd_create
    2. open file(buffer/direct)
    3. udmabuf create
    4. mmap memfd
    5. read file into memfd vaddr
Yeah this is really slow and the worst way to do it. You absolutely want
to start _all_ the io before you start creating the dma-buf, ideally
with
everything running in parallel. But just starting the direct I/O with
async and then creating the umdabuf should be a lot faster and avoid
That's greate,  Let me rephrase that, and please correct me if I'm wrong.

UDMA-BUF step:
   1. memfd_create
   2. mmap memfd
   3. open file(buffer/direct)
   4. start thread to async read
   3. udmabuf create

With this, can improve
I just test with it. Step is:

UDMA-BUF step:
   1. memfd_create
   2. mmap memfd
   3. open file(buffer/direct)
   4. start thread to async read
   5. udmabuf create

   6 . join wait

3G file read all step cost 1,527,103,431ns, it's greate.
Ok that's almost the throughput of your patch set, which I think is close
enough. The remaining difference is probably just the mmap overhead, not
sure whether/how we can do direct i/o to an fd directly ... in principle
it's possible for any file that uses the standard pagecache.

Yes, for mmap, IMO, now that we get all folios and pin it. That's mean all pfn it's got when udmabuf created.

So, I think mmap with page fault is helpless for save memory but increase the mmap access cost.(maybe can save a little page table's memory)

I want to offer a patchset to remove it and more suitable for folios operate(And remove unpin list). And contains some fix patch.

I'll send it when I test it's good.


About fd operation for direct I/O, maybe use sendfile or copy_file_range?

sendfile base pipe buffer, it's low performance when I test is.

copy_file_range can't work due to it's not the same file system.

So, I can't find other way to do it. Can someone give some suggestions?

-Sima




[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux