On 20/11/2024 16:44, David Hildenbrand wrote:
If the problem is the "pagecache" overhead, then yes, it will be a harder nut to crack. But maybe there are some low-hanging fruits to optimize? Finding the main cause for the added overhead would be interesting.
Agreed, knowing the exact root cause would be really nice.
Can you compare uffdio_copy() when using anonymous memory vs. shmem? That's likely the best we could currently achieve with guest_memfd.
Yeah, I was doing that too. It was about ~28% slower in my setup, while with guest_memfd it was ~34% slower. The variance of the data was quite high so the difference may well be just noise. In other words, I'd be much happier if we could bring guest_memfd (or even shmem) performance closer to the anon/private than if we just equalised guest_memfd with shmem (which are probably already pretty close).
There is the tools/testing/selftests/mm/uffd-stress benchmark, not sure if that is of any help; it SEGFAULTS for me right now with a (likely) division by 0.
Thanks for the pointer, will take a look!
Cheers, David / dhildenb