On Fri, Apr 22, 2011 at 9:10 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Fri, 22 Apr 2011 09:02:18 -0500 > Steve French <smfrench@xxxxxxxxx> wrote: > >> On Fri, Apr 22, 2011 at 6:50 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >> > On Fri, 22 Apr 2011 12:09:20 +0400 >> > Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote: >> > >> >> After conversation with Steve, we decided to drop >> >> filemap_write_and_wait from getattr, because we already do it in >> >> cifs_file_aio_write. I also think that we should drop it from >> >> cifs_llseek in this case too. I will repost this patch later (launder >> >> page operation patch was merged earlier). >> >> >> > >> > I wasn't privy to this discussion, but that makes no sense to me. Just >> > because we initiated writeout in cifs_file_aio_write, does not mean >> > that it's complete. If it's not complete then the size returned by the >> > server may be bogus. >> >> What would a local file system do in the case when a write is >> racing with a getattr? In the case of cifs, when we issue >> a write, and don't have oplock, we immediately send the >> write on the network - but AFAIK posix provides no guarantees >> about ordering if they are issued at the same time. >> > > It's not a problem for a local filesystem as there's only one set of > metadata to deal with. Even if the writes aren't synced out to disk, > you still know how big the file is. > > With a client/server setup like cifs or nfs, you have to deal with two, > and when there are buffered writes then there will be discrepancies. > The simplest way to deal with discrepancies there is to make sure > that there aren't any by flushing out any buffered writes before > fetching the attributes. it may sound simpler but doesn't prevent a write racing in just before we send the QueryFileInfo on the wire, and it does hurt performance to redundantly invoke filemap_fdatawrite. If the write extends the file size, I don't mind the extra call, but even in that case, the next revalidate will get the new file size, and the first revalidate will return the smaller file size that was current (since the write was only started but not completed) -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html