There is a related question of how deferring writes slightly (e.g. to buffer a full page or a full smb buffer) could corrupt data? AFAIK - there isn't an ordering guarantee in posix when writes from two different applications overlap (thus the need for byte range locks or other synchronization mechanisms, or flush/fsync). On Mon, Apr 30, 2012 at 6:28 AM, Jeff Layton <jlayton@xxxxxxxxx> wrote: > On Fri, 27 Apr 2012 06:56:15 -0400 > Jeff Layton <jlayton@xxxxxxxxxx> wrote: > >> cifs.ko currently does not follow the CIFS protocol as it should by >> default. The protocol specifies (paraphrasing) "Thou shalt not cache >> file data without an oplock." cifs.ko by default *does* do this however, >> opting for a more NFS-like cache coherency scheme. It caches file data >> in the pagecache, and invalidate it when the attributes look like the >> file has changed. >> >> Unfortunately, this can and does cause cache coherency problems. The >> kernel can end up buffering up writes for longer than it should. In >> particular, internal testing at RH with Samba/CTDB has shown cache >> coherency problems when we don't mount on the clients with >> strictcache: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=742314 >> >> Up until now, we've shied away from changing this default primarily for >> performance reasons. The code to handle uncached reads and writes was >> synchronous, and very slow. With the addition of async uncached writes >> in 3.4 and async uncached reads in 3.5, I think performance is good >> enough that we ought to change the defaults to be "strictcache" for >> better adherence to the protocol. >> >> This patchset begins this transition by cleaning up the mount options >> that we use to control the cache behavior, and adding some warnings >> about the impending behavior change. I'm proposing that we follow the >> kernel convention of warning for 2 releases about user-visible changes >> and then we'll make the switch in 3.7. >> >> At the same time, it corrects a problem with the current code. The >> different cache behaviors are not considered to be mutually exclusive >> even though they should be. If someone (for instance) specifies >> "strictcache" and "fsc" the results are poorly defined. The first >> patch corrects this. >> >> If this looks acceptable, I'll also plan to spin up a manpage patch >> to help document this change. >> > > (cc'ing samba-technical too for those that don't follow linux-cifs...) > > I discussed this some via IM with Steve last week and he has his doubts > about whether we should do this. I think we ought to bring this > discussion to the lists though so we can try to settle this in time for > the next kernel merge window. > > The (valid) point that he brings up is that this will harm performance > when an application is doing small I/Os sizes and does not have an > oplock. This is true -- reads and writes will be synchronous and will > match the sizes that come from userspace. An application doing small > I/Os will suffer performance-wise when it does not have an oplock on an > open file. > > He also claims that Windows will still buffer up writes when it doesn't > have an oplock in an effort to do larger I/Os on the wire. I've got a > dochelp request into MS to clarify this point and what the protocol > mandates, but I don't think the answer there will have much bearing on > what we should do. There are a number of problems with keeping > "cache=loose" the default... > > When we do I/O out of the pagecache, then we more or less do it along > page boundaries. If another client has locked a region in the middle of > a page, then that request will generally fail. We have no way to know > what region is locked, so we have no way to handle this case. > > We also have quite a few documented cases where using "strictcache" has > fixed cache coherency problems. Here's one, for instance, but there are > others: > > https://bugzilla.samba.org/show_bug.cgi?id=6174 > > I think it's important that cifs.ko is safe to use by default, even if > it's a little slower. We can always tell people "try cache=loose" if > their workload is slow with "cache=strict", but having defaults that > can munch your data is really not acceptable. > > I guess the question we have to answer is: > > "Is it worth risking data corruption in order to eke out better > performance when a user is doing small I/Os and does not have an oplock?" > > To me, the answer is an unqualified "no". In most cases, you don't have > an oplock because another client has the file open. In those cases, > cache coherency generally trumps performance. In cases where it > doesn't, a user has the freedom to mount with "cache=loose" with the > understanding of all that that entails. > > Thoughts? > -- > Jeff Layton <jlayton@xxxxxxxxx> -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html