> On Feb 14, 2023, at 8:53 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Mon, 2023-02-13 at 22:28 -0500, Trond Myklebust wrote: >> On Mon, 2023-02-13 at 16:49 -0800, Rick Macklem wrote: >>> On Mon, Feb 13, 2023 at 1:14 PM Jeff Layton <jlayton@xxxxxxxxxx> >>> wrote: >>>> >>>> CAUTION: This email originated from outside of the University of >>>> Guelph. Do not click links or open attachments unless you recognize >>>> the sender and know the content is safe. If in doubt, forward >>>> suspicious emails to IThelp@xxxxxxxxxxx >>>> >>>> >>>> The write verifier exists to tell the client when the server may >>>> have >>>> forgotten some unstable writes. The typical way that this happens >>>> is if >>>> the server crashes, but we've also extended nfsd to change it when >>>> there >>>> are writeback errors as well. >>>> >>>> The way it works today though, we call something like vfs_fsync >>>> (e.g. >>>> for a COMMIT call) and if we get back an error, we'll reset the >>>> write >>>> verifier. >>>> >>>> This is non-optimal for a couple of reasons: >>>> >>>> 1/ There could be significant delay between an error being >>>> recorded and the reset. It would be ideal if the write verifier >>>> were to >>>> change as soon as the error was recorded. >>>> >>>> 2/ It's a bit of a waste, in that if we get a writeback error on a >>>> single inode, we'll end up resetting the write verifier for >>>> everything, >>>> even on inodes that may be fine (e.g. on a completely separate fs). >>>> >>> Here's the snippet from RFC8881: >>> The final portion of the result is the field writeverf. This >>> field >>> is the write verifier and is a cookie that the client can use to >>> determine whether a server has changed instance state (e.g., >>> server >>> restart) between a call to WRITE and a subsequent call to either >>> WRITE or COMMIT. This cookie MUST be unchanged during a single >>> instance of the NFSv4.1 server and MUST be unique between >>> instances >>> of the NFSv4.1 server. If the cookie changes, then the client >>> MUST >>> assume that any data written with an UNSTABLE4 value for committed >>> and an old writeverf in the reply has been lost and will need to >>> be >>> recovered. >>> >>> I've always interpreted the writeverf as "per-server" and not "per- >>> file". >>> Although I'll admit the above does not make that crystal clear, it >>> does make >>> it clear that the writeverf applies to a "server instance" and not a >>> file or >>> file system on the server. >>> >>> The FreeBSD client assumes it is "per-server" and re-writes all >>> uncommitted >>> writes for the server, not just ones for the file (or file system) >>> the >>> writeverf is >>> replied with. (I vaguely recall Solaris does the same?) >>> >>> At the very least, I think you should run this past the IETF working >>> group >>> (nfsv4@xxxxxxxx) to see what they say w.r.t. the writeverf being >>> "per-file" vs >>> "per-server". >>> >> >> As I recall, we've already had this discussion on the IETF NFSv4 >> working group mailing list: >> https://mailarchive.ietf.org/arch/msg/nfsv4/99Ow2muMylXKWd9lzi9_BX2LJDY/ >> >> >> That's why I kept it a global in the first place. >> >> Now note that RFC8881 does also clarify in Section 18.3.3 that: >> >> >> The server must vary the value of the write >> verifier at each server event or instantiation that may lead to a >> loss of uncommitted data. Most commonly this occurs when the server >> is restarted; however, other events at the server may result in >> uncommitted data loss as well. >> >> So I feel it is quite OK to use the verifier the way we do now in order >> to signify that a fatal write error has occurred and that clients must >> resend any data that was uncommitted. >> > > Thanks, I missed that discussion. I think you guys have convinced me > that we have to keep this per-server. I won't bother starting a new > thread on it. > > It's a pity. It would have been a lot more elegant as a per-inode thing! > > Chuck, I think that means we'll just want to keep patch #1 in this > series? Regarding patch 1/3: "sizeof(verf)" works as well as "sizeof(*verf) * 2" and is a little more clear to boot. You can redrive a v2 of your patch or I can make one. Up to you. -- Chuck Lever