One thing I wonder is if we always want to zero copy for payload. In the extreme, do we want to take the overhead of pinning an extra page if we are only reading/writing a byte? memcpy is expensive for large packets, but may actually be more efficient for small packets. Have we done any performance measurements of this code with various payload sizes versus non-zero-copy? Of course that may not really show the impact of pinning the extra pages.... In any case, if such a tradeoff did exist, we might choose to not do zero copy for requests smaller than some size -- and that might alleviate some of the problems with the legacy protocols (such as the Rerror issue) -- killing two birds with one stone. In any case, given the implementations it should be really easy to shut me up with comparison data of zc and non-zc for 1, 64, 128, 256, 512, 1024, 2048, 4192 byte payloads (without caches enabled of course). -eric On Mon, Feb 7, 2011 at 12:57 AM, Venkateswararao Jujjuri (JV) <jvrao@xxxxxxxxxxxxxxxxxx> wrote: > On 2/6/2011 11:21 PM, Venkateswararao Jujjuri (JV) wrote: >> Signed-off-by: Venkateswararao Jujjuri <jvrao@xxxxxxxxxxxxxxxxxx> >> --- >> net/9p/client.c | 45 +++++++++++++++++++++++++++++++-------------- >> net/9p/protocol.c | 38 ++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 69 insertions(+), 14 deletions(-) >> >> diff --git a/net/9p/client.c b/net/9p/client.c >> index a848bca..f939edf 100644 >> --- a/net/9p/client.c >> +++ b/net/9p/client.c >> @@ -1270,7 +1270,14 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset, >> if (count < rsize) >> rsize = count; >> >> - req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize); >> + if ((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) == >> + P9_TRANS_PREF_PAYLOAD_SEP) { >> + req = p9_client_rpc(clnt, P9_TREAD, "dqE", fid->fid, offset, >> + rsize, data ? data : udata); >> + } else { >> + req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, >> + rsize); >> + } >> if (IS_ERR(req)) { >> err = PTR_ERR(req); >> goto error; >> @@ -1284,13 +1291,15 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset, >> >> P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count); >> >> - if (data) { >> - memmove(data, dataptr, count); >> - } else { >> - err = copy_to_user(udata, dataptr, count); >> - if (err) { >> - err = -EFAULT; >> - goto free_and_error; >> + if (!req->tc->pbuf_size) { >> + if (data) { >> + memmove(data, dataptr, count); >> + } else { >> + err = copy_to_user(udata, dataptr, count); >> + if (err) { >> + err = -EFAULT; >> + goto free_and_error; >> + } >> } >> } >> p9_free_req(clnt, req); >> @@ -1323,12 +1332,20 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata, >> >> if (count < rsize) >> rsize = count; >> - if (data) >> - req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset, >> - rsize, data); >> - else >> - req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset, >> - rsize, udata); >> + >> + if ((clnt->trans_mod->pref & P9_TRANS_PREF_PAYLOAD_MASK) == >> + P9_TRANS_PREF_PAYLOAD_SEP) { >> + req = p9_client_rpc(clnt, P9_TWRITE, "dqF", fid->fid, offset, >> + rsize, data ? data : udata); >> + } else { >> + if (data) >> + req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, >> + offset, rsize, data); >> + else >> + req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, >> + offset, rsize, udata); >> + } >> + >> if (IS_ERR(req)) { >> err = PTR_ERR(req); >> goto error; >> diff --git a/net/9p/protocol.c b/net/9p/protocol.c >> index dfc358f..ea778dd 100644 >> --- a/net/9p/protocol.c >> +++ b/net/9p/protocol.c >> @@ -114,6 +114,24 @@ pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size) >> return size - len; >> } >> >> +static size_t >> +pdu_write_uw(struct p9_fcall *pdu, const char *udata, size_t size) >> +{ >> + size_t len = min(pdu->capacity - pdu->size, size); >> + pdu->pbuf = udata; >> + pdu->pbuf_size = len; >> + return size - len; >> +} >> + >> +static size_t >> +pdu_write_ur(struct p9_fcall *pdu, const char *udata, size_t size) >> +{ >> + size_t len = min(pdu->capacity - pdu->size, size); >> + pdu->pbuf = udata; >> + pdu->pbuf_size = len; >> + return size - len; >> +} >> + >> /* >> b - int8_t >> w - int16_t >> @@ -445,6 +463,26 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt, >> errcode = -EFAULT; >> } >> break; >> + case 'E':{ >> + int32_t count = va_arg(ap, int32_t); >> + const char *udata = va_arg(ap, const void *); >> + errcode = p9pdu_writef(pdu, proto_version, "d", >> + count); >> + if (!errcode && pdu_write_ur(pdu, udata, >> + count)) >> + errcode = -EFAULT; >> + } >> + break; >> + case 'F':{ >> + int32_t count = va_arg(ap, int32_t); >> + const char *udata = va_arg(ap, const void *); >> + errcode = p9pdu_writef(pdu, proto_version, "d", >> + count); >> + if (!errcode && pdu_write_uw(pdu, udata, >> + count)) >> + errcode = -EFAULT; >> + } >> + break; >> case 'U':{ >> int32_t count = va_arg(ap, int32_t); >> const char __user *udata = > > > > ------------------------------------------------------------------------------ > The modern datacenter depends on network connectivity to access resources > and provide services. The best practices for maximizing a physical server's > connectivity to a physical network are well understood - see how these > rules translate into the virtual world? > http://p.sf.net/sfu/oracle-sfdevnlfb > _______________________________________________ > V9fs-developer mailing list > V9fs-developer@xxxxxxxxxxxxxxxxxxxxx > https://lists.sourceforge.net/lists/listinfo/v9fs-developer > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html