Al Viro wrote on Wed, Jun 08, 2022 at 05:41:46PM +0000: > As it is, p9_client_zc_rpc()/p9_check_zc_errors() is playing fast > and loose with copy_from_iter_full(). > > Background: reading from file is done by sending Tread request. > Response consists of fixed-sized header (including the amount of data > actually read) followed by the data itself. > > For zero-copy case we arrange the things so that the first 11 > bytes of reply go into the fixed-sized buffer, with the rest going > straight into the pages we want to read into. Ugh... zc really needs something like direct data placement NFS/RDMA has been doing... (But that's just not possible without extending the protocol) > What makes the things inconvenient is that sglist describing > what should go where has to be set *before* the reply arrives. As > the result, if reply is an error, the things get interesting. Success > is > size[4] Rread tag[2] count[4] data[count] > For error layout varies depending upon the protocol variant - > in original 9P and 9P2000 it's > size[4] Rerror tag[2] len[2] error[len] > in 9P2000.U > size[4] Rerror tag[2] len[2] error[len] errno[4] > in 9P2000.L > size[4] Rlerror tag[2] errno[4] > > The last case is nice and simple - we have an 11-byte response that fits > into the fixed-sized buffer we hoped to get an Rread into. In other > two, though, we get a variable-length string spill into the pages > we'd prepared for the data to be read. That makes me wonder just how much use we get for the legacy protocols -- I guess we do have some but all the filesystem-y implementations that I would expect to be main users for large IOs/zc are 9P2000.L as far as I know -- especially considering virtio is pretty much limited to qemu? Are there other 9p virtio servers? So would it make sense to just say "not .L? tough luck, no zc", or am I just being lazy? > Had that been in fixed-sized buffer (which is actually 4K), we would've > dealt with that the same way we handle non-zerocopy case. However, > for zerocopy it doesn't end up there, so we need to copy it from > those pages. > > The trouble is, by the time we get around to that, the references to > pages in question are already dropped. As the result, p9_zc_check_errors() > tries to get the data using copy_from_iter_full(). Unfortunately, the > iov_iter it's trying to read from might *NOT* be capable of that. > It is, after all, a data destination, not data source. In particular, > if it's an ITER_PIPE one, copy_from_iter_full() will simply fail. Silly question, in case of a pipe we'll have written something we shouldn't have, or is it not gone yet until we actually finish the IO with iov_iter_advance? (my understanding is that reader won't have anything to read until someone else does the advance, and that someone else will have overwritten the data with their own content so no garbage wlll be read) > The thing is, in ->zc_request() itself we have those pages. There it > would be a simple matter of memcpy_from_page() into the fixed-sized > buffer and it isn't hard to recognize the (rare) case when such > copying is needed. That way we get rid of p9_zc_check_errors() entirely > - p9_check_errors() can be used instead both for zero-copy and non-zero-copy > cases. > > Do you see any problems with the variant below? ... With that said though your approach here definitely looks better than what we currently have -- my main issue is that truncating is fine for the original 9p2000 but for .U you'd be losing the ecode that we currently trust for being errno-compatible. It's *probably* not a problem in practice, but preserving that errno would theorically make us look for the page where the last few bytes went to and overwrite the end of the string with it but that's starting to be ugly. Anyway even not doing that is probably better than reading from something we no longer own; but I'm still thinking just refusing non-.L variants to zc calls is a better decision long term. -- Dominique