On Wed, 2017-10-25 at 15:37 -0400, Olga Kornievskaia wrote: > On Wed, Oct 25, 2017 at 1:37 PM, Trond Myklebust > <trond.myklebust@xxxxxxxxxxxxxxx> wrote: > > Ben Coddington has noted the following race between OPEN and CLOSE > > on a single client. > > > > Process 1 Process 2 Server > > ========= ========= ====== > > > > 1) OPEN file > > 2) OPEN file > > 3) Process OPEN (1) > > seqid=1 > > 4) Process OPEN (2) > > seqid=2 > > 5) Reply OPEN (2) > > 6) Receive reply (2) > > 7) new stateid, seqid=2 > > > > 8) CLOSE file, using > > stateid w/ seqid=2 > > 9) Reply OPEN (1) > > 10( Process CLOSE (8) > > 11) Reply CLOSE (8) > > 12) Forget stateid > > file closed > > > > 13) Receive reply (7) > > 14) Forget stateid > > file closed. > > > > 15) Receive reply (1). > > 16) New stateid seqid=1 > > is really the same > > stateid that was > > closed. > > > > IOW: the reply to the first OPEN is delayed. Since "Process 2" does > > not wait before closing the file, and it does not cache the closed > > stateid, then when the delayed reply is finally received, it is > > treated > > as setting up a new stateid by the client. > > > > The fix is to ensure that the client processes the OPEN and CLOSE > > calls > > in the same order in which the server processed them. > > > > This commit ensures that we examine the seqid of the stateid > > returned by OPEN. If it is a new stateid, we assume the seqid > > must be equal to the value 1, and that each state transition > > increments the seqid value by 1 (See RFC7530, Section 9.1.4.2, > > and RFC5661, Section 8.2.2). > > > > If the tracker sees that an OPEN returns with a seqid that is > > greater > > than the cached seqid + 1, then it bumps a flag to ensure that the > > caller waits for the RPCs carrying the missing seqids to complete. > > Please help me with my confusion: > I believe the code used to serialize OPENs on the open owner. Then we > allowed parallel opens. Without parallel opens this wouldn't have > happened, is that correct? Also is your solution again serializing > the > opens since it says the caller waits for the missing seqid to > complete. I read this as the 2nd open waits of the 1st open to > complete before proceeding or is that wrong? There are a couple of differences. 1. The NFSv4.0 code serialises on the open owner sequence id, meaning you can only have 1 OPEN or CLOSE call on the wire at any time for a given user. In other words multiple processes opening different files would find themselves to be serialised. The NFSv4.1 code (even with this patch) allows those processes to operate in fully parallel mode. 2. We only wait if we receive replies to OPEN calls for the same file in a different order than the server processed them. IOW: this should not be a huge burden unless you have lots of processes owned by the same user, and all trying to open and close the same file randomly. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥