On Sat, Nov 25, 2017 at 12:35 PM, David Howells <dhowells@xxxxxxxxxx> wrote: > > Doesn't clear_page_dirty_for_io() write-protect the PTE for the page to be > written out, in which case ->page_mkwrite() will get called again before the > page is redirtied? No, it literally just sets the dirty bit (and does accounting). But I think you may be right that we always write-protect he page when we move the dirty bit from the page tables to the 'struct page' (page_mkclean_one()). However, even when you do that, the page can be writable in other mappings. At least fork(), for example, only clears the dirty bit, doesn't mark it write-protected. So there is some rate-limiting of dirty pages, but I do not believe that we've ever really *serialized* writes. >> (b) can cause some really nasty latency issues > > True, but I think the most common case is a file being opened, written start > to finish and then closed. Actually, the worst-handled thing I've seen is a > shell script appending a bunch of things to a file because ->flush() syncs the > file each time it is closed:-/ > > What would you recommend instead? I'm currently trying and keep track of what > needs to be written so that I only write what's changed to the server, rather > than writing only whole pages. I don't think that what you are doing is necessarily wrong, I'm just pointing out that you may still see mmap'ed pages being modified concurrently with the actual IO, and that this will potentially mean (for example) that things like checksums won't be reliably unless you do the checksum as you copy the data to a network packet or something. Of course, if that kind of inconsistency happens, a later write-back will also happen, and eventually fix it. So the server may see temporarily 'wrong' data, but it won't be final. I just hope that the inconsistency isn't fatal to the afs client or server code. For example, if you retry writes forever when a checksum were to not match the data, that would be bad. And yes, this can be (a) really hard to trigger in practice (b) very hard to debug due to a combination of very specific timing and behavior. so I just wanted to bring this up as a potential issue, not necessarily as a big problem. Linus