Re: [PATCH] NFS4: Fix v4.0 client state corruption when mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6 Nov 2019, at 21:34, zhangxiaoxu (A) wrote:

在 2019/11/7 0:47, Benjamin Coddington 写道:
Hi ZhangXiaoxu,

I'm having a bit of trouble with this fix (which went upstream in
f02f3755dbd14fb935d24b14650fff9ba92243b8).

Since this change, my client calls SETCLIENTID/SETCLIENTID_CONFIRM twice in quick succession on mount, and the second SETCLIENTID_CONFIRM sent by the state
manager can sometimes have the same verifier sent back by the first
SETCLIENTID's response.  I think we're missing a memory barrier somewhere..

nfs40_discover_server_trunking
	nfs4_proc_setclientid # the first time
	
after nfs4_schedule_state_manager, the state manager:
nfs4_run_state_manager
  nfs4_state_manager
    # 'nfs4_alloc_client' init state to NFS4CLNT_LEASE_EXPIRED
    nfs4_reclaim_lease
      nfs4_establish_lease
        nfs4_init_clientid
          nfs4_proc_setclientid # the second time.


But, I do not understand how the client was able to corrupt the state before this patch, and I don't understand how the patch fixes state corruption.

Can anyone enlighten me as to how we were corrupting state here?
when 'nfs4_alloc_client', the client state initialized with 'NFS4CLNT_LEASE_EXPIRED',
So, we should recover it when the client init.

Why? The commit message asserts that if we don't, there will be corruption
of the client's state.   How can the client's state be corrupted?

After the first setclientid, maybe we should clear the 'NFS4CLNT_LEASE_EXPIRED', then
the state manager won't be called it again.

What about just never setting it in nfs4_alloc_client? It has been set there since 286d7d6a0cb, and that commit needed it because it changed the paths so
that the only way to ever send a SETCLIENTID was through state recovery.

Today, we have two paths - the trunking discovery for new clients as well as
state recovery, and all new clients will go through trunking discovery.

Removing the call to kick the state manager thread will fix the problem
that the update to cl_confirm happens after the state manager has already
started doing its own SETCLIENTID dance.

Ben





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux