> On Jan 23, 2018, at 12:52 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Mon, Jan 22, 2018 at 05:09:35PM -0800, Chuck Lever wrote: >> >> >>> On Jan 22, 2018, at 2:00 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >>> >>> On Wed, Jan 03, 2018 at 03:42:35PM -0500, Chuck Lever wrote: >>>> Move common code in NFSD's symlink arg decoders into a helper. The >>>> immediate benefits include: >>>> >>>> - one fewer data copies on transports that support DDP >>>> - no memory allocation in the NFSv4 XDR decoder >>>> - consistent error checking across all versions >>>> - reduction of code duplication >>>> - support for both legal forms of SYMLINK requests on RDMA >>>> transports for all versions of NFS (in particular, NFSv2, for >>>> completeness) >>>> >>>> In the long term, this helper is an appropriate spot to perform a >>>> per-transport call-out to fill the pathname argument using, say, >>>> RDMA Reads. >>>> >>>> Filling the pathname in the proc function also means that eventually >>>> the incoming filehandle can be interpreted so that filesystem- >>>> specific memory can be allocated as a sink for the pathname >>>> argument, rather than using anonymous pages. >>>> >>>> Wondering why the current code punts a zero-length SYMLINK. Is it >>>> illegal to create a zero-length SYMLINK on Linux? >>> >>> SYMLINK(2) says >>> >>> ENOENT A directory component in linkpath does not exist or is a >>> dangling symbolic link, or target or linkpath is an empty >>> string. >>> >>> That doesn't explain the INVAL, or why this is the right place to be >>> checking it. >> >> RFC 1813: >> >> NFS3ERR_IO >> NFS3ERR_ACCES >> NFS3ERR_EXIST >> NFS3ERR_NOTDIR >> NFS3ERR_NOSPC >> NFS3ERR_ROFS >> NFS3ERR_NAMETOOLONG >> NFS3ERR_DQUOT >> NFS3ERR_STALE >> NFS3ERR_BADHANDLE >> NFS3ERR_NOTSUPP >> NFS3ERR_SERVERFAULT >> >> Interestingly, neither INVAL nor NOENT are valid >> status codes for NFSv3 SYMLINK. NFS3ERR_NOTSUPP >> might be closest, I suppose. >> >> RFC 5661 says explicitly: >> >> If the objname has a length of zero, or if objname does not obey the >> UTF-8 definition, the error NFS4ERR_INVAL will be returned. >> >> And lists these as valid status codes for CREATE(NF4LNK): >> >> | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | >> | NFS4ERR_BADCHAR, NFS4ERR_BADNAME, | >> | NFS4ERR_BADOWNER, NFS4ERR_BADTYPE, | >> | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | >> | NFS4ERR_DELAY, NFS4ERR_DQUOT, | >> | NFS4ERR_EXIST, NFS4ERR_FHEXPIRED, | >> | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MLINK, | >> | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | >> | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | >> | NFS4ERR_NOTDIR, NFS4ERR_OP_NOT_IN_SESSION, | >> | NFS4ERR_PERM, NFS4ERR_REP_TOO_BIG, | >> | NFS4ERR_REP_TOO_BIG_TO_CACHE, | >> | NFS4ERR_REQ_TOO_BIG, | >> | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_ROFS, | >> | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | >> | NFS4ERR_TOO_MANY_OPS, | >> | NFS4ERR_UNSAFE_COMPOUND | >> >> >>> I'm a little nervous about the NULL termination in >>> svc_fill_symlink_pathname; how do we know it's safe to write a zero >>> there? I haven't checked it carefully yet. >> >> svc_fill_symlink_pathname grabs a whole fresh page >> from @rqstp. It is safe to write bytes anywhere in >> that page. > > How do we know it's safe to grab another page? It's safe in the v3 case because that's what the v3 decoder currently does. I think you are concerned about the NFSv4 COMPOUND case on TCP, for arbitrarily large compounds? Given that we have 257-ish pages in the rqstp->rq_pages array (ie, that number is bounded) I'm not sure how we can ever be sure there are enough pages for all combinations of NFS operations. > (Could we run out of > pages? I'd probably know the answer if I'd rewritten this code > recently, but I haven't and the details are swapped out....) > > Also I don't think that's true in the first->iov_base == 0 case. This case occurs when an RDMA transport has filled in the first page of arg->pages with the symlink contents. It's also a whole fresh page that was obtained from rqstp->rq_pages. Here, are you concerned about the client sending a COMPOUND over RPC/RDMA that has more than one SYMLINK operation? Our server currently supports only one Read chunk per RPC. > --b. > >> >> >>> --g. >>> >>>> >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>> --- >>>> fs/nfsd/nfs3proc.c | 10 +++++++ >>>> fs/nfsd/nfs3xdr.c | 51 ++++++++------------------------- >>>> fs/nfsd/nfs4proc.c | 7 +++++ >>>> fs/nfsd/nfs4xdr.c | 10 +++++-- >>>> fs/nfsd/nfsproc.c | 14 +++++---- >>>> fs/nfsd/nfsxdr.c | 49 +++++++++++++++++++------------- >>>> fs/nfsd/xdr.h | 1 + >>>> fs/nfsd/xdr3.h | 1 + >>>> fs/nfsd/xdr4.h | 2 + >>>> include/linux/sunrpc/svc.h | 2 + >>>> net/sunrpc/svc.c | 67 ++++++++++++++++++++++++++++++++++++++++++++ >>>> 11 files changed, 146 insertions(+), 68 deletions(-) >>>> >>>> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c >>>> index 2dd95eb..6259a4b 100644 >>>> --- a/fs/nfsd/nfs3proc.c >>>> +++ b/fs/nfsd/nfs3proc.c >>>> @@ -283,6 +283,16 @@ >>>> struct nfsd3_diropres *resp = rqstp->rq_resp; >>>> __be32 nfserr; >>>> >>>> + if (argp->tlen == 0) >>>> + RETURN_STATUS(nfserr_inval); >>>> + if (argp->tlen > NFS3_MAXPATHLEN) >>>> + RETURN_STATUS(nfserr_nametoolong); >>>> + >>>> + argp->tname = svc_fill_symlink_pathname(rqstp, &argp->first, >>>> + argp->tlen); >>>> + if (IS_ERR(argp->tname)) >>>> + RETURN_STATUS(nfserrno(PTR_ERR(argp->tname))); >>>> + >>>> dprintk("nfsd: SYMLINK(3) %s %.*s -> %.*s\n", >>>> SVCFH_fmt(&argp->ffh), >>>> argp->flen, argp->fname, >>>> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c >>>> index 240cdb0e..78b555b 100644 >>>> --- a/fs/nfsd/nfs3xdr.c >>>> +++ b/fs/nfsd/nfs3xdr.c >>>> @@ -452,51 +452,24 @@ void fill_post_wcc(struct svc_fh *fhp) >>>> nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) >>>> { >>>> struct nfsd3_symlinkargs *args = rqstp->rq_argp; >>>> - unsigned int len, avail; >>>> - char *old, *new; >>>> - struct kvec *vec; >>>> + char *base = (char *)p; >>>> + size_t dlen; >>>> >>>> if (!(p = decode_fh(p, &args->ffh)) || >>>> - !(p = decode_filename(p, &args->fname, &args->flen)) >>>> - ) >>>> + !(p = decode_filename(p, &args->fname, &args->flen))) >>>> return 0; >>>> p = decode_sattr3(p, &args->attrs); >>>> >>>> - /* now decode the pathname, which might be larger than the first page. >>>> - * As we have to check for nul's anyway, we copy it into a new page >>>> - * This page appears in the rq_res.pages list, but as pages_len is always >>>> - * 0, it won't get in the way >>>> - */ >>>> - len = ntohl(*p++); >>>> - if (len == 0 || len > NFS3_MAXPATHLEN || len >= PAGE_SIZE) >>>> - return 0; >>>> - args->tname = new = page_address(*(rqstp->rq_next_page++)); >>>> - args->tlen = len; >>>> - /* first copy and check from the first page */ >>>> - old = (char*)p; >>>> - vec = &rqstp->rq_arg.head[0]; >>>> - if ((void *)old > vec->iov_base + vec->iov_len) >>>> - return 0; >>>> - avail = vec->iov_len - (old - (char*)vec->iov_base); >>>> - while (len && avail && *old) { >>>> - *new++ = *old++; >>>> - len--; >>>> - avail--; >>>> - } >>>> - /* now copy next page if there is one */ >>>> - if (len && !avail && rqstp->rq_arg.page_len) { >>>> - avail = min_t(unsigned int, rqstp->rq_arg.page_len, PAGE_SIZE); >>>> - old = page_address(rqstp->rq_arg.pages[0]); >>>> - } >>>> - while (len && avail && *old) { >>>> - *new++ = *old++; >>>> - len--; >>>> - avail--; >>>> - } >>>> - *new = '\0'; >>>> - if (len) >>>> - return 0; >>>> + args->tlen = ntohl(*p++); >>>> >>>> + args->first.iov_base = p; >>>> + args->first.iov_len = rqstp->rq_arg.head[0].iov_len; >>>> + args->first.iov_len -= (char *)p - base; >>>> + >>>> + dlen = args->first.iov_len + rqstp->rq_arg.page_len + >>>> + rqstp->rq_arg.tail[0].iov_len; >>>> + if (dlen < XDR_QUADLEN(args->tlen) << 2) >>>> + return 0; >>>> return 1; >>>> } >>>> >>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >>>> index 5029b96..36bd1f7 100644 >>>> --- a/fs/nfsd/nfs4proc.c >>>> +++ b/fs/nfsd/nfs4proc.c >>>> @@ -605,6 +605,13 @@ static void gen_boot_verifier(nfs4_verifier *verifier, struct net *net) >>>> >>>> switch (create->cr_type) { >>>> case NF4LNK: >>>> + if (create->cr_datalen > NFS4_MAXPATHLEN) >>>> + return nfserr_nametoolong; >>>> + create->cr_data = >>>> + svc_fill_symlink_pathname(rqstp, &create->cr_first, >>>> + create->cr_datalen); >>>> + if (IS_ERR(create->cr_data)) >>>> + return nfserrno(PTR_ERR(create->cr_data)); >>>> status = nfsd_symlink(rqstp, &cstate->current_fh, >>>> create->cr_name, create->cr_namelen, >>>> create->cr_data, &resfh); >>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c >>>> index bd25230..d05384e 100644 >>>> --- a/fs/nfsd/nfs4xdr.c >>>> +++ b/fs/nfsd/nfs4xdr.c >>>> @@ -648,6 +648,7 @@ static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, >>>> static __be32 >>>> nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create) >>>> { >>>> + struct kvec *head; >>>> DECODE_HEAD; >>>> >>>> READ_BUF(4); >>>> @@ -656,10 +657,13 @@ static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, >>>> case NF4LNK: >>>> READ_BUF(4); >>>> create->cr_datalen = be32_to_cpup(p++); >>>> + if (create->cr_datalen == 0) >>>> + return nfserr_inval; >>>> + head = argp->rqstp->rq_arg.head; >>>> + create->cr_first.iov_base = p; >>>> + create->cr_first.iov_len = head->iov_len; >>>> + create->cr_first.iov_len -= (char *)p - (char *)head->iov_base; >>>> READ_BUF(create->cr_datalen); >>>> - create->cr_data = svcxdr_dupstr(argp, p, create->cr_datalen); >>>> - if (!create->cr_data) >>>> - return nfserr_jukebox; >>>> break; >>>> case NF4BLK: >>>> case NF4CHR: >>>> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c >>>> index 1995ea6..f107f9f 100644 >>>> --- a/fs/nfsd/nfsproc.c >>>> +++ b/fs/nfsd/nfsproc.c >>>> @@ -449,17 +449,19 @@ >>>> struct svc_fh newfh; >>>> __be32 nfserr; >>>> >>>> + if (argp->tlen > NFS_MAXPATHLEN) >>>> + return nfserr_nametoolong; >>>> + >>>> + argp->tname = svc_fill_symlink_pathname(rqstp, &argp->first, >>>> + argp->tlen); >>>> + if (IS_ERR(argp->tname)) >>>> + return nfserrno(PTR_ERR(argp->tname)); >>>> + >>>> dprintk("nfsd: SYMLINK %s %.*s -> %.*s\n", >>>> SVCFH_fmt(&argp->ffh), argp->flen, argp->fname, >>>> argp->tlen, argp->tname); >>>> >>>> fh_init(&newfh, NFS_FHSIZE); >>>> - /* >>>> - * Crazy hack: the request fits in a page, and already-decoded >>>> - * attributes follow argp->tname, so it's safe to just write a >>>> - * null to ensure it's null-terminated: >>>> - */ >>>> - argp->tname[argp->tlen] = '\0'; >>>> nfserr = nfsd_symlink(rqstp, &argp->ffh, argp->fname, argp->flen, >>>> argp->tname, &newfh); >>>> >>>> diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c >>>> index 165e25e..8fcd047 100644 >>>> --- a/fs/nfsd/nfsxdr.c >>>> +++ b/fs/nfsd/nfsxdr.c >>>> @@ -71,22 +71,6 @@ __be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp) >>>> } >>>> >>>> static __be32 * >>>> -decode_pathname(__be32 *p, char **namp, unsigned int *lenp) >>>> -{ >>>> - char *name; >>>> - unsigned int i; >>>> - >>>> - if ((p = xdr_decode_string_inplace(p, namp, lenp, NFS_MAXPATHLEN)) != NULL) { >>>> - for (i = 0, name = *namp; i < *lenp; i++, name++) { >>>> - if (*name == '\0') >>>> - return NULL; >>>> - } >>>> - } >>>> - >>>> - return p; >>>> -} >>>> - >>>> -static __be32 * >>>> decode_sattr(__be32 *p, struct iattr *iap) >>>> { >>>> u32 tmp, tmp1; >>>> @@ -383,14 +367,39 @@ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *f >>>> nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) >>>> { >>>> struct nfsd_symlinkargs *args = rqstp->rq_argp; >>>> + char *base = (char *)p; >>>> + size_t xdrlen; >>>> >>>> if ( !(p = decode_fh(p, &args->ffh)) >>>> - || !(p = decode_filename(p, &args->fname, &args->flen)) >>>> - || !(p = decode_pathname(p, &args->tname, &args->tlen))) >>>> + || !(p = decode_filename(p, &args->fname, &args->flen))) >>>> return 0; >>>> - p = decode_sattr(p, &args->attrs); >>>> >>>> - return xdr_argsize_check(rqstp, p); >>>> + args->tlen = ntohl(*p++); >>>> + if (args->tlen == 0) >>>> + return 0; >>>> + >>>> + args->first.iov_base = p; >>>> + args->first.iov_len = rqstp->rq_arg.head[0].iov_len; >>>> + args->first.iov_len -= (char *)p - base; >>>> + >>>> + /* This request is never larger than a page. Therefore, >>>> + * transport will deliver either: >>>> + * 1. pathname in the pagelist -> sattr is in the tail. >>>> + * 2. everything in the head buffer -> sattr is in the head. >>>> + */ >>>> + if (rqstp->rq_arg.page_len) { >>>> + if (args->tlen != rqstp->rq_arg.page_len) >>>> + return 0; >>>> + p = rqstp->rq_arg.tail[0].iov_base; >>>> + } else { >>>> + xdrlen = XDR_QUADLEN(args->tlen); >>>> + if (xdrlen > args->first.iov_len - (8 * sizeof(__be32))) >>>> + return 0; >>>> + p += xdrlen; >>>> + } >>>> + decode_sattr(p, &args->attrs); >>>> + >>>> + return 1; >>>> } >>>> >>>> int >>>> diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h >>>> index a765c41..ea7cca3 100644 >>>> --- a/fs/nfsd/xdr.h >>>> +++ b/fs/nfsd/xdr.h >>>> @@ -72,6 +72,7 @@ struct nfsd_symlinkargs { >>>> char * tname; >>>> unsigned int tlen; >>>> struct iattr attrs; >>>> + struct kvec first; >>>> }; >>>> >>>> struct nfsd_readdirargs { >>>> diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h >>>> index deccf7f..2cb29e9 100644 >>>> --- a/fs/nfsd/xdr3.h >>>> +++ b/fs/nfsd/xdr3.h >>>> @@ -90,6 +90,7 @@ struct nfsd3_symlinkargs { >>>> char * tname; >>>> unsigned int tlen; >>>> struct iattr attrs; >>>> + struct kvec first; >>>> }; >>>> >>>> struct nfsd3_readdirargs { >>>> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h >>>> index d56219d..b485cd1 100644 >>>> --- a/fs/nfsd/xdr4.h >>>> +++ b/fs/nfsd/xdr4.h >>>> @@ -110,6 +110,7 @@ struct nfsd4_create { >>>> struct { >>>> u32 datalen; >>>> char *data; >>>> + struct kvec first; >>>> } link; /* NF4LNK */ >>>> struct { >>>> u32 specdata1; >>>> @@ -124,6 +125,7 @@ struct nfsd4_create { >>>> }; >>>> #define cr_datalen u.link.datalen >>>> #define cr_data u.link.data >>>> +#define cr_first u.link.first >>>> #define cr_specdata1 u.dev.specdata1 >>>> #define cr_specdata2 u.dev.specdata2 >>>> >>>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h >>>> index 238b9ae..fd5846e 100644 >>>> --- a/include/linux/sunrpc/svc.h >>>> +++ b/include/linux/sunrpc/svc.h >>>> @@ -495,6 +495,8 @@ int svc_register(const struct svc_serv *, struct net *, const int, >>>> char * svc_print_addr(struct svc_rqst *, char *, size_t); >>>> unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, >>>> struct kvec *first, size_t total); >>>> +char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, >>>> + struct kvec *first, size_t total); >>>> >>>> #define RPC_MAX_ADDRBUFLEN (63U) >>>> >>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c >>>> index 759b668..fc93406 100644 >>>> --- a/net/sunrpc/svc.c >>>> +++ b/net/sunrpc/svc.c >>>> @@ -1578,3 +1578,70 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct kvec *first, >>>> return i; >>>> } >>>> EXPORT_SYMBOL_GPL(svc_fill_write_vector); >>>> + >>>> +/** >>>> + * svc_fill_symlink_pathname - Construct pathname argument for VFS symlink call >>>> + * @rqstp: svc_rqst to operate on >>>> + * @first: buffer containing first section of pathname >>>> + * @total: total length of the pathname argument >>>> + * >>>> + * Returns pointer to a NUL-terminated string, or an ERR_PTR. The buffer is >>>> + * released automatically when @rqstp is recycled. >>>> + */ >>>> +char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, >>>> + size_t total) >>>> +{ >>>> + struct xdr_buf *arg = &rqstp->rq_arg; >>>> + struct page **pages; >>>> + char *result; >>>> + >>>> + /* VFS API demands a NUL-terminated pathname. This function >>>> + * uses a page from @rqstp as the pathname buffer, to enable >>>> + * direct placement. Thus the total buffer size is PAGE_SIZE. >>>> + * Space in this buffer for NUL-termination requires that we >>>> + * cap the size of the returned symlink pathname just a >>>> + * little early. >>>> + */ >>>> + if (total > PAGE_SIZE - 1) >>>> + return ERR_PTR(-ENAMETOOLONG); >>>> + >>>> + /* Some types of transport can present the pathname entirely >>>> + * in rq_arg.pages. If not, then copy the pathname into one >>>> + * page. >>>> + */ >>>> + pages = arg->pages; >>>> + WARN_ON_ONCE(arg->page_base != 0); >>>> + if (first->iov_base == 0) { >>>> + result = page_address(*pages); >>>> + result[total] = '\0'; >>>> + } else { >>>> + size_t len, remaining; >>>> + char *dst; >>>> + >>>> + result = page_address(*(rqstp->rq_next_page++)); >>>> + dst = result; >>>> + remaining = total; >>>> + >>>> + len = min_t(size_t, total, first->iov_len); >>>> + memcpy(dst, first->iov_base, len); >>>> + dst += len; >>>> + remaining -= len; >>>> + >>>> + /* No more than one page left */ >>>> + if (remaining) { >>>> + len = min_t(size_t, remaining, PAGE_SIZE); >>>> + memcpy(dst, page_address(*pages), len); >>>> + dst += len; >>>> + } >>>> + >>>> + *dst = '\0'; >>>> + } >>>> + >>>> + /* Sanity check: we don't allow the pathname argument to >>>> + * contain a NUL byte. >>>> + */ >>>> + if (strlen(result) != total) >>>> + return ERR_PTR(-EINVAL); >>>> + return result; >>>> +} >>>> +EXPORT_SYMBOL_GPL(svc_fill_symlink_pathname); >> >> -- >> Chuck Lever -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html