On 22 Sep 2020, at 17:46, Trond Myklebust wrote: > On Tue, 2020-09-22 at 17:15 -0400, Trond Myklebust wrote: >> On Tue, 2020-09-22 at 17:08 -0400, Benjamin Coddington wrote: >>> On 22 Sep 2020, at 15:48, Trond Myklebust wrote: >>> >>>> On Tue, 2020-09-22 at 15:15 -0400, Benjamin Coddington wrote: >>>>> Since commit 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID >>>>> in >>>>> CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a >>>>> CLOSE >>>>> races >>>>> with the update of the nfs_state: >>>>> >>>>> Process 1 Process 2 Server >>>>> ========= ========= ======== >>>>> OPEN file >>>>> OPEN file >>>>> Reply OPEN (1) >>>>> Reply OPEN (2) >>>>> Update state (1) >>>>> CLOSE file (1) >>>>> Reply OLD_STATEID (1) >>>>> CLOSE file (2) >>>>> Reply CLOSE (-1) >>>>> Update state (2) >>>>> wait for state change >>>>> OPEN file >>>>> wake >>>>> CLOSE file >>>>> OPEN file >>>>> wake >>>>> CLOSE file >>>>> ... >>>>> ... >>>>> >>>>> As long as the first process continues updating state, the >>>>> second >>>>> process >>>>> will fail to exit the loop in >>>>> nfs_set_open_stateid_locked(). This >>>>> livelock >>>>> has been observed in generic/168. >>>>> >>>>> Fix this by detecting the case in >>>>> nfs_need_update_open_stateid() >>>>> and >>>>> then exit the loop if: >>>>> - the state is NFS_OPEN_STATE, and >>>>> - the stateid sequence is > 1, and >>>>> - the stateid doesn't match the current open stateid >>>>> >>>>> Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in >>>>> CLOSE/OPEN_DOWNGRADE") >>>>> Cc: stable@xxxxxxxxxxxxxxx # v5.4+ >>>>> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> >>>>> --- >>>>> fs/nfs/nfs4proc.c | 27 +++++++++++++++++---------- >>>>> 1 file changed, 17 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>>>> index 6e95c85fe395..9db29a438540 100644 >>>>> --- a/fs/nfs/nfs4proc.c >>>>> +++ b/fs/nfs/nfs4proc.c >>>>> @@ -1588,18 +1588,25 @@ static void >>>>> nfs_test_and_clear_all_open_stateid(struct nfs4_state *state) >>>>> static bool nfs_need_update_open_stateid(struct nfs4_state >>>>> *state, >>>>> const nfs4_stateid *stateid) >>>>> { >>>>> - if (test_bit(NFS_OPEN_STATE, &state->flags) == 0 || >>>>> - !nfs4_stateid_match_other(stateid, &state- >>>>>> open_stateid)) { >>>>> - if (stateid->seqid == cpu_to_be32(1)) >>>>> + bool state_matches_other = >>>>> nfs4_stateid_match_other(stateid, >>>>> &state->open_stateid); >>>>> + bool seqid_one = stateid->seqid == cpu_to_be32(1); >>>>> + >>>>> + if (test_bit(NFS_OPEN_STATE, &state->flags)) { >>>>> + /* The common case - we're updating to a new >>>>> sequence >>>>> number */ >>>>> + if (state_matches_other && >>>>> nfs4_stateid_is_newer(stateid, &state->open_stateid)) { >>>>> + nfs_state_log_out_of_order_open_stateid >>>>> (state, >>>>> stateid); >>>>> + return true; >>>>> + } >>>>> + } else { >>>>> + /* This is the first OPEN */ >>>>> + if (!state_matches_other && seqid_one) { >>>> >>>> Why are we looking at the value of state_matches_other here? If >>>> the >>>> NFS_OPEN_STATE flags is not set, then the state->open_stateid is >>>> not >>>> initialised, so how does it make sense to look at a comparison >>>> with >>>> the >>>> incoming stateid? >>> >>> You're right - it doesn't make sense. I failed to clean it up out >>> of >>> my >>> decision matrix. I'll send a v3 after it gets tested overnight. >>> >>> Thanks for the look, and if you have another moment - what do you >>> think >>> about not bumping the seqid in the CLOSE/OPEN_DOWNGRADE path on >>> OLD_STATEID >>> if the state's refcount > 1? This would optimize away a lot of >>> recovery >>> for >>> races like this, but I wonder if it would be possible to end up >>> with >>> two >>> OPEN_DOWNGRADEs dueling that would never recover. >>> >> >> It would lead to a return of the infinite loop in cases where the >> client misses a reply to an OPEN or CLOSE due to a soft timeout >> (which >> is an important use case for us). I hadn't thought about that, but doesn't the task return and decrement the refcount on nfs4_state? > In principle, RFC5661 says that the client is supposed to process all > stateid operations in the order dictated by the seqid. The one problem > with that mandate is that open-by-filename doesn't allow us to know > whether or not a seqid bump is outstanding. This is why we have the 5 > second timeout wait in nfs_set_open_stateid_locked(). > > We could do something similar to that for CLOSE by setting the seqid to > 0, and then ensuring we process the CLOSE in the order the server ends > up processing it. Unfortunately, that would require us to replay any > OPEN calls that got shut down because we used seqid 0 (it would also > not work for NFSv4.0)... So perhaps the best solution would be to > instead allow CLOSE to wait for outstanding OPENs to complete, just > like we do in nfs_set_open_stateid_locked()? Thoughts? How can you know if there are outstanding OPENs? I thought that was the whole intention of the CLOSE -> OLD_STATEID retry bump -- you cannot know if the server was able to process an interrupted OPEN. I had a dumb patch that just caused CLOSE to delay a bit before retrying if it received OLD_STATEID, but I figured that was vulnerable to another livelock where another process could just keep doing open(), close() as well. Ben