Hi Neil On Fri, 2019-02-22 at 11:58 +1100, NeilBrown wrote: > Hi, > I have a report of an NFSv4.1 state management problem in our 4.4 > kernel which appears to be caused by a race between OPEN and > OPEN_DOWNGRADE, and I don't think the race is fixed in mainline. > > The test program creates multiple threads which open a file O_RDWR > and > write to it, then opens for O_RDONLY and reads from it. > A network trace which shows the problem suggests that at a point in > time where there are some O_RDONLY opens and one O_RDWR open, a new > O_RDWR open is requested just as the O_RDWR open is being closed. > > The close of the O_RDWR clears state->n_rdwr early so > can_open_cached() > fails for the O_RDWR open, and it needs to go to the server. > The open for O_RDWR doesn't increment n_rdwr until after the open > succeeds, so nfs4_close_prepare sees > n_rdwr == 0 > n_rdonly > 0 > NFS_O_RDWR_STATE and NFS_O_RDONLY_STATE set > which causes it to choose an OPEN_DOWNGRADE. > > What we see is a OPEN/share-all and an OPEN_DOWNGRADE/share-read > request are sent one after the other without waiting for a reply. > The OPEN is processed first, then the OPEN_DOWNGRADE, resulting in a > state that only allows reads. Then a WRITE is attempted which > fails. > This enters a infinite loop with 2 steps: > - a WRITE gets NFS4ERR_OPENMODE > - a TEST_STATEID succeeds > > Once an OPEN/share-all request has been sent, it isn't really > correct > to send an OPEN_DOWNGRADE/share-read request. However the fact that > the OPEN has been sent isn't visible to nfs4_close_prepare(). > > There is an asymmetry between open and close w.r.t. updating the > n_[mode] counter and setting the NFS_O_mode_STATE bits. > > For close, the counter is decremented, then the server is told, then > the state bits are cleared. > For open, the counter and state bits are both cleared after the > server > is asked. > > I understand that this is probably because the OPEN could fail, and > incrementing a counter before we are sure of success seems > unwise. But > doing so would allow us to avoid the incorrect OPEN_DOWNGRADE. > > Any suggestions on what a good solution would look like? Does it > ever > make sense for an OPEN request to be concurrent with a CLOSE or > OPEN_DOWNGRADE ?? Maybe they should be serialized with each other > (maybe not as fully serialized as NFSv4.0, but more than they > currently > are in NFSv4.1) If the CLOSE/OPEN_DOWNGRADE is processed by the server _after_ the OPEN, then the stateid presented by the client will have an incorrect seqid, and so the server should reject the operation with a NFS4ERR_OLD_STATEID. If, on the other hand, the CLOSE/OPEN_DOWNGRADE is processed before the OPEN, then the client should either see a new stateid with a new 'other' field (CLOSE) or it will see the same stateid but with a seqid that does not match what it expects (OPEN_DOWNGRADE). In the current mainline code, the client will then attempt to wait for any "missing" seqids to get processed (see nfs_set_open_stateid_locked()). Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx