On Thu, Aug 15, 2013 at 5:14 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Thu, Aug 15, 2013 at 03:26:09PM -0700, Andy Lutomirski wrote: >> On Thu, Aug 15, 2013 at 3:18 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> > On Thu, Aug 15, 2013 at 02:43:09PM -0700, Andy Lutomirski wrote: >> >> On Thu, Aug 15, 2013 at 2:37 PM, Dave Chinner >> >> <david@xxxxxxxxxxxxx> wrote: >> >> > On Thu, Aug 15, 2013 at 08:17:18AM -0700, Andy Lutomirski wrote: > >> >> In current kernels, this chain of events won't work: >> >> >> >> - Server goes down >> >> - Server comes up >> >> - Userspace on server calls mmap and writes something >> >> - Client reconnects and invalidates its cache >> >> - Userspace on server writes something else *to the same page* >> >> >> >> The client will never notice the second write, because it won't update >> >> any inode state. >> > >> > That's wrong. The server wrote the dirty page before the client >> > reconnected, therefore it got marked clean. >> >> Why would it write the dirty page? > > Terminology mismatch - you said it "writes something", not "dirties > the page". So, it's easy to take that as "does writeback" as opposed > to "dirties memory". When I say "writes something" I mean literally performs a store to memory. That is: ptr[offset] = value; In my example, the client will *never* catch up. > >> > The second write to the >> > server page marks it dirty again, causing page_mkwrite to be >> > called, thereby updating the timestamp/i_version field. So, the NFS >> > client will notice the second change on the server, and it will >> > notice it immediately after the second access has occurred, not some >> > time later when: >> > >> >> With my patches, the client will as soon as the >> >> server starts writeback. >> > >> > Your patches introduce a 30+ second window where a file can be dirty >> > on the server but the NFS server doesn't know about it and can't >> > tell the clients about it because i_version doesn't get bumped until >> > writeback..... >> >> I claim that there's an infinite window right now, and that 30 seconds >> is therefore an improvement. > > You're talking about after the second change is made. I'm talking > about the difference in behaviour after the *initial change* is > made. Your changes will result in the client not doing an > invalidation because timestamps don't get changed for 30s with your > patches. That's the problem - the first change of a file needs to > bump the i_version immediately, not in 30s time. > > That's why delaying timestamp updates doesn't fix the scalability > problem that was reported. It might fix a different problem, but it > doesn't void the *requirment* that filesystems need to do > transactional updates during page faults.... > And this is why I'm unconvinced that your requirement is sensible. It's attempting to make sure that every mmaped write results in a some kind of FS update, but it actually only results in an FS update *before* the *first* mmapped write after writeback. It's racy as hell. My approach is slow but not racy. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html