Re: [RFC PATCH 02/17] zuf: Preliminary Documentation

Miklos Szeredi <mszeredi@xxxxxxxxxx> · Wed, 20 Feb 2019 09:27:54 +0100

On Tue, Feb 19, 2019 at 12:51 PM Boaz harrosh <boaz@xxxxxxxxxxxxx> wrote:
>

> +4. An FS operation like create or WRITE/READ and so on arrives from application
> +   via VFS. Eventually an Operation is dispatched to zus:
> +   ▪ A special per-operation descriptor is filled up with all parameters.
> +   ▪ A current CPU channel is grabbed. the operation descriptor is put on
> +     that channel (ZT). Including get_user_pages or Kernel-pages associated
> +     with this OPT.
> +   ▪ The ZT is awaken, app thread put to sleep.
> +   ▪ In ZT context pages are mapped to that ZT-vma. This is so we are sure
> +     the map is only on a single core. And no other core's TLB is affected.
> +     (This here is the all performance secret)

I still don't get it.  You say mapping the page to server address
space is the performance secret.  I say it's a security issue as well
as being perfectly useless, except for special cases.

So let's see.  There's the pmem case, which seems to be what this is
mostly about.    So we take a pmem filesystem and e.g. a read(2)
syscall.  There's no zero copy to talk about in that case since data
needs to be copied from pmem to application buffer.   If we want to
minimize memory copies, than it will be a single memory copy from pmem
to the app buffer.   Your implementation chooses to do this copy in
the userland server, via the application buffer mapped into its
address space.   But the memory copy could just as well be done by the
kernel, from one virtual address to another; the kernel just has to
juggle with physical page lookups, but does not have to establish a
new mapping, which makes it all the more secure and performant.

Sure, we've heard about arguments why the above doesn't work if the
data comes from e.g. a network where the network driver could be
writing data directly to application buffer.  But I've shown how that
could also be done with a new rsplice() interface, again without
having to insert the application page into the server address space.
And no, this doesn't have to involve syscall overhead, if that turns
out to be the limiting factor: mapped pipes might be an interesting
new concept as well.

The only way I see this implementation actually save a memory copy is
if a transformation is required (which again, already implies at least
a single memory copy) such as compression or encryption.   But even
that could be delegated to the kernel, since it does have plenty of
such transformations implemented, only an interface is required to
tell the kernel which one to do on the buffer.

Thanks,
Miklos