Re: [RFC][CFT] handling Rerror without copy_from_iter_full()

Dominique Martinet <asmadeus@xxxxxxxxxxxxx> · Sat, 11 Jun 2022 22:27:54 +0900

Al Viro wrote on Wed, Jun 08, 2022 at 05:41:46PM +0000:
> 	As it is, p9_client_zc_rpc()/p9_check_zc_errors() is playing fast
> and loose with copy_from_iter_full().
> 
> 	Background: reading from file is done by sending Tread request.
> Response consists of fixed-sized header (including the amount of data
> actually read) followed by the data itself.
> 
> 	For zero-copy case we arrange the things so that the first 11
> bytes of reply go into the fixed-sized buffer, with the rest going
> straight into the pages we want to read into.

Ugh... zc really needs something like direct data placement NFS/RDMA has
been doing... (But that's just not possible without extending the
protocol)

> 	What makes the things inconvenient is that sglist describing
> what should go where has to be set *before* the reply arrives.  As
> the result, if reply is an error, the things get interesting.  Success
> is
> 	size[4] Rread tag[2] count[4] data[count]
> For error layout varies depending upon the protocol variant -
> in original 9P and 9P2000 it's
> 	size[4] Rerror tag[2] len[2] error[len]
> in 9P2000.U
> 	size[4] Rerror tag[2] len[2] error[len] errno[4]
> in 9P2000.L
> 	size[4] Rlerror tag[2] errno[4]
> 
> The last case is nice and simple - we have an 11-byte response that fits
> into the fixed-sized buffer we hoped to get an Rread into.  In other
> two, though, we get a variable-length string spill into the pages
> we'd prepared for the data to be read.

That makes me wonder just how much use we get for the legacy
protocols -- I guess we do have some but all the filesystem-y
implementations that I would expect to be main users for large
IOs/zc are 9P2000.L as far as I know -- especially considering
virtio is pretty much limited to qemu? Are there other 9p virtio
servers?

So would it make sense to just say "not .L? tough luck, no zc",
or am I just being lazy?

> Had that been in fixed-sized buffer (which is actually 4K), we would've
> dealt with that the same way we handle non-zerocopy case.  However,
> for zerocopy it doesn't end up there, so we need to copy it from
> those pages.
> 
> The trouble is, by the time we get around to that, the references to
> pages in question are already dropped.  As the result, p9_zc_check_errors()
> tries to get the data using copy_from_iter_full().  Unfortunately, the
> iov_iter it's trying to read from might *NOT* be capable of that.
> It is, after all, a data destination, not data source.  In particular,
> if it's an ITER_PIPE one, copy_from_iter_full() will simply fail.

Silly question, in case of a pipe we'll have written something we
shouldn't have, or is it not gone yet until we actually finish the IO
with iov_iter_advance?
(my understanding is that reader won't have anything to read until
someone else does the advance, and that someone else will have
overwritten the data with their own content so no garbage wlll be read)

> The thing is, in ->zc_request() itself we have those pages.  There it
> would be a simple matter of memcpy_from_page() into the fixed-sized
> buffer and it isn't hard to recognize the (rare) case when such
> copying is needed.  That way we get rid of p9_zc_check_errors() entirely
> - p9_check_errors() can be used instead both for zero-copy and non-zero-copy
> cases.
> 
> Do you see any problems with the variant below?

... With that said though your approach here definitely looks better
than what we currently have -- my main issue is that truncating is fine
for the original 9p2000 but for .U you'd be losing the ecode that we
currently trust for being errno-compatible.
It's *probably* not a problem in practice, but preserving that errno
would theorically make us look for the page where the last few bytes
went to and overwrite the end of the string with it but that's starting
to be ugly.

Anyway even not doing that is probably better than reading from
something we no longer own; but I'm still thinking just refusing non-.L
variants to zc calls is a better decision long term.

-- 
Dominique