Re: [RFC 1/7] mm: Add new vma flag VM_LOCAL_CPU

Miklos Szeredi <mszeredi@xxxxxxxxxx> · Thu, 15 Mar 2018 17:10:14 +0100

On Thu, Mar 15, 2018 at 4:27 PM, Boaz Harrosh <boazh@xxxxxxxxxx> wrote:
> On 15/03/18 10:47, Miklos Szeredi wrote:

>> With your scheme it's like:
>>
>> - get_user_pages
>> - map pages into server address space
>> - send request to server
>> - server does direct-io read from network/disk fd into mapped buffer
>> - server sends reply
>> - done
>>
>> This could be changed to
>> - get_user_pages
>> - insert pages into pipe
>> - send request to server
>> - server "reverse splices" buffers from  pipe to network/disk fd
>
> This can never properly translate. Even a simple file on disk
> is linear for the app (unaligned buffer) but is scattered on
> multiple blocks on disk. Yes perhaps networking can somewhat work
> if you pre/post-pend the headers you need.
> And you restrict direct IO semantics on everything specially the APP
> with my system you can do zero copy on any kind of application

I lost you there, sorry.

How will your scheme deal with alignment issues better than my scheme?

> And this assumes networking or some-device. Which means going back
> to the Kernel, which in ZUFS rules you must return -ASYNC to the zuf
> and complete in a background ASYNC thread. This is an order of a magnitude
> higher latency then what I showed here.

Indeed.

> And what about the SYNC copy from Server to APP. With a pipe you
> are forcing me to go back to the Kernel to execute the copy. which
> means two more crossings. This will double the round trips.

If you are trying to minimize the roundtrips, why not cache the
mapping in the kernel?  That way you don't necessarily have to go to
userspace at all.  With readahead logic, the server will be able to
preload the mapping before the reads happen, and you basically get the
same speed as an in-kernel fs would.

Also don't quite understand how are you planning to generalize beyond
the pmem case.  The interface is ready for that, sure.  But what about
caching?  Will that be done in the server?   Does that make sense?
Kernel already has page cache for that purpose and userspace cache
won't ever be as good as kernel cache.

Thanks,
Miklos