Re: [pnfs] [PATCH 1/1] nfs41: resolve some race conditions with queued SEQUENCE operations when unmounting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 14, 2009 at 2:30 PM, Trond Myklebust
<trond.myklebust@xxxxxxxxxx> wrote:
> On Wed, 2009-10-14 at 15:57 -0700, Alexandros Batsakis wrote:
>> Signed-off-by: Alexandros Batsakis <batsakis@xxxxxxxxxx>
>> ---
>>  fs/nfs/nfs4_fs.h   |    1 +
>>  fs/nfs/nfs4proc.c  |    3 +++
>>  fs/nfs/nfs4state.c |   11 ++++++++---
>>  3 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
>> index 6ea07a3..554af98 100644
>> --- a/fs/nfs/nfs4_fs.h
>> +++ b/fs/nfs/nfs4_fs.h
>> @@ -45,6 +45,7 @@ enum nfs4_client_state {
>>       NFS4CLNT_RECLAIM_NOGRACE,
>>       NFS4CLNT_DELEGRETURN,
>>       NFS4CLNT_SESSION_SETUP,
>> +     NFS4CLNT_SESSION_DESTROY,
>>  };
>>
>>  /*
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index c321002..9eb21d2 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -359,6 +359,8 @@ static void nfs41_sequence_done(struct nfs_client *clp,
>>       struct nfs4_slot_table *tbl;
>>       struct nfs4_slot *slot;
>>
>> +     if (!nfs4_has_session(clp))
>> +             goto out;
>>       /*
>>        * sr_status remains 1 if an RPC level error occurred. The server
>>        * may or may not have processed the sequence operation..
>> @@ -4616,6 +4618,7 @@ struct nfs4_session *nfs4_alloc_session(struct nfs_client *clp)
>>
>>  void nfs4_destroy_session(struct nfs4_session *session)
>>  {
>> +     set_bit(NFS4CLNT_SESSION_DESTROY, &session->clp->cl_state);
>>       nfs4_proc_destroy_session(session);
>>       dprintk("%s Destroy backchannel for xprt %p\n",
>>               __func__, session->clp->cl_rpcclient->cl_xprt);
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index 1394dfb..a509b26 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -1253,11 +1253,16 @@ static void nfs4_state_manager(struct nfs_client *clp)
>>               }
>>               /* Initialize or reset the session */
>>               if (test_and_clear_bit(NFS4CLNT_SESSION_SETUP, &clp->cl_state)
>> -                && nfs4_has_session(clp)) {
>> +                 && nfs4_has_session(clp)) {
>>                       if (clp->cl_cons_state == NFS_CS_SESSION_INITING)
>>                               status = nfs4_initialize_session(clp);
>> -                     else
>> -                             status = nfs4_reset_session(clp);
>> +                     else {
>> +                             if (test_bit(NFS4CLNT_SESSION_DESTROY,
>> +                                          &clp->cl_state))
>> +                                     status = -EIO;
>> +                             else
>> +                                     status = nfs4_reset_session(clp);
>> +                     }
>>                       if (status) {
>>                               if (status == -NFS4ERR_STALE_CLIENTID)
>>                                       continue;
>
> Wait... Why is all this needed?
>
> AFAICS, nfs_destroy_session() is only called by nfs_free_client() once
> the very last reference to the nfs_client has been released. Now since
> the state manager keeps a reference to the nfs_client all the time while
> it is running (see nfs4_run_state_manager()), we _can't_ have a race
> between it and nfs4_destroy_session().
>

This is correct very and in the normal case it is not a problem.
However, the problems start when the server is down/rebooted/etc and
the client tries to umount.
Then, the (SEQUENCE) requests are _already_ queued and when the server
comes up or the request is cancelled (ctl-c), we see these race
conditions:

a) nfs41_sequence_done() called after destroy_session() that leads to
a NULL pointer dereference
b) a BADSESSION reply to a sequence operation triggers a
reset_session() at the same time with destroy_session() (called by
umount) that leads to another NULL pointer dereference.

I can send you the call traces if you need more info. I understand
that these are pretty rare cases, but they shouldn't lead to a crash.

-alexandros

> _______________________________________________
> pNFS mailing list
> pNFS@xxxxxxxxxxxxx
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux