On 03/15/2018 02:42 PM, Miklos Szeredi wrote:
Ideally most of the complexity would be in the page cache. Not sure
how ready it is to handle pmem pages?
The general case (non-pmem) will always have to be handled
differently; you've just stated that it's much less latency sensitive
and needs async handling. Basing the design on just trying to make
it use the same mechanism (userspace copy) is flawed in my opinion,
since it's suboptimal for either case.
Thanks,
Miklos
OK So I was thinking hard on all this and am changing my mind and
agreeing with all that was said.
I want that the usFS plugin will have all the different options and
have an easy way to tell Kernel which mode to use.
Let me summarize all the options:
1. Sync, userspace copy directly to app-buffers (current implementation)
2. Async block device operation (none pmem)
zuf owns all devices pmem and none pmem at mount time and provides
a very efficient access to both. In the harddisk / ssd case as part
of an IO call
the server returns -EWOULD_BLOCK and in the background will issue a
scatter_gather call through zuf.
The memory target for the IO can be pmem, directly to user-buffers
(DIO), transient
server buffers.
On completion an up call is made to ZUF to complete the IO
operation and
release the waiting application.
3. Splice and R-spilce
In the case that the IO target is not a block-device but an
external path like
network / rdma / some none block device.
Zuf already holds an internal object describing the IO context
including the
GUP app buffers. This internal object can be made the memory target
of a splice
operation.
4. Get-io_map type operation (currently implemented for mmap)
The zus-FS returns a set of dpp_t(s) to kernel and the Kernel does
the memcopy
to app buffers. The Server also specifies if those buffers should
be cached
on a per inode radix-tree (xarray) and if so at the next access to
the same
range Kernel does the copy and never dispatches to user-space
In this mode the Server can also revoke a cached mapping when needed
5. Use of VFS page-cache
For a very slow backing device the FS request the regular VFS
page-cache.
On read/write_pages() vector zuf uses option 1. above to read into
page-cache
instead of app-buffers directly. Only cache misses dispatch back to
user-space
Have I forgotten anything?
This way the zus-FS is in control and can do the "right thing" depending on
target device and FS characteristics. The interface lets us have a rich
set of
tools to be used.
Hope that answers your concerns
Boaz