On Jun 14, 2016, at 2:46 PM, J . Bruce Fields wrote: > On Tue, Jun 14, 2016 at 11:56:20AM -0400, Oleg Drokin wrote: >> >> On Jun 14, 2016, at 11:46 AM, J . Bruce Fields wrote: >> >>> On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: >>>> It used to be the case that state had an rwlock that was locked for write >>>> by downgrades, but for read for upgrades (opens). Well, the problem is >>>> if there are two competing opens for the same state, they step on >>>> each other toes potentially leading to leaking file descriptors >>>> from the state structure, since access mode is a bitmap only set once. >>>> >>>> Extend the holding region around in nfsd4_process_open2() to avoid >>>> racing entry into nfs4_get_vfs_file(). >>>> Make init_open_stateid() return with locked stateid to be unlocked >>>> by the caller. >>>> >>>> Now this version held up pretty well in my testing for 24 hours. >>>> It still does not address the situation if during one of the racing >>>> nfs4_get_vfs_file() calls we are getting an error from one (first?) >>>> of them. This is to be addressed in a separate patch after having a >>>> solid reproducer (potentially using some fault injection). >>>> >>>> Signed-off-by: Oleg Drokin <green@xxxxxxxxxxxxxx> >>>> --- >>>> fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- >>>> fs/nfsd/state.h | 2 +- >>>> 2 files changed, 28 insertions(+), 21 deletions(-) >>>> >>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>> index f5f82e1..fa5fb5a 100644 >>>> --- a/fs/nfsd/nfs4state.c >>>> +++ b/fs/nfsd/nfs4state.c >>>> @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, >>>> struct nfs4_openowner *oo = open->op_openowner; >>>> struct nfs4_ol_stateid *retstp = NULL; >>>> >>>> + /* We are moving these outside of the spinlocks to avoid the warnings */ >>>> + mutex_init(&stp->st_mutex); >>>> + mutex_lock(&stp->st_mutex); >>> >>> A mutex_init_locked() primitive might also be convenient here. >> >> I know! I would be able to do it under spinlock then without moving this around too. >> >> But alas, not only there is not one, mutex documentation states this is disallowed. > > You're just talking about this comment?: > > * It is not allowed to initialize an already locked mutex. > > That's a weird comment. You're proably right that what they meant was > something like "It is not allowed to initialize a mutex to locked > state". But, I don't know, taken literally that comment doesn't make > sense (how could you even distinguish between an already-locked mutex > and an uninitialized mutex?), so maybe it'd be worth asking. I think this is because of the strict ownership tracking or something. I guess I can ask. >>> You could also take the two previous lines from the caller into this >>> function instead of passing in stp, that might simplify the code. >>> (Haven't checked.) >> >> I am not really sure what do you mean here. >> These lines are moved from further away in this function )well, just the init, anyway). >> >> Having half initialisation of stp here and half in the caller sounds kind of strange >> to me. > > I was thinking of something like the following--so init_open_stateid > hides more of the details of the swapping. Untested. Does it look like > an improvement to you? > > There's got to be a way to make this code a little less convoluted.... > > --b. > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index fa5fb5aa4847..41b59854c40f 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3480,13 +3480,15 @@ alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open, > } > > static struct nfs4_ol_stateid * > -init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > - struct nfsd4_open *open) > +init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) > { > > struct nfs4_openowner *oo = open->op_openowner; > struct nfs4_ol_stateid *retstp = NULL; > + struct nfs4_ol_stateid *stp; > > + stp = open->op_stp; > + open->op_stp = NULL; > /* We are moving these outside of the spinlocks to avoid the warnings */ > mutex_init(&stp->st_mutex); > mutex_lock(&stp->st_mutex); > @@ -3512,9 +3514,12 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > out_unlock: > spin_unlock(&fp->fi_lock); > spin_unlock(&oo->oo_owner.so_client->cl_lock); > - if (retstp) > - mutex_lock(&retstp->st_mutex); > - return retstp; > + if (retstp) { > + nfs4_put_stid(&stp->st_stid); So as I am trying to integrate this into my patchset, do we really need this? We don't if we took the other path and left this one hanging off the struct nfsd4_open (why do we need to assign it NULL before the search?) I imagine then we'd save some free/realloc churn as well? I assume struct nfsd4_open cannot be shared between threads? Otherwise we have bigger problems at hand like mutex init on a locked mutex from another thread and stuff. I'll try this theory I guess. > + stp = retstp; > + mutex_lock(&stp->st_mutex); > + } > + return stp; > } > > /* > @@ -4310,7 +4315,6 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > struct nfs4_client *cl = open->op_openowner->oo_owner.so_client; > struct nfs4_file *fp = NULL; > struct nfs4_ol_stateid *stp = NULL; > - struct nfs4_ol_stateid *swapstp = NULL; > struct nfs4_delegation *dp = NULL; > __be32 status; > > @@ -4347,16 +4351,9 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > goto out; > } > } else { > - stp = open->op_stp; > - open->op_stp = NULL; > - /* > - * init_open_stateid() either returns a locked stateid > - * it found, or initializes and locks the new one we passed in > - */ > - swapstp = init_open_stateid(stp, fp, open); > - if (swapstp) { > - nfs4_put_stid(&stp->st_stid); > - stp = swapstp; > + /* stp is returned locked: */ > + stp = init_open_stateid(fp, open); > + if (stp->st_access_bmap == 0) { > status = nfs4_upgrade_open(rqstp, fp, current_fh, > stp, open); > if (status) { -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html