Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Then why do we have to wait for PG_writeback to complete? > > At least for PG_writeback, it's about "the _previous_ dirty write is > still under way, but - since PG_dirty is set again - the page has been > dirtied since". > > So we have to start _another_ writeback, because while the current > writeback *might* have written the updated data, that is not at all > certain or clear. As I understand it, it's also about serialising writes from the same page to the same backing store. We don't want them to end up out-of-order. I'm not sure what guarantees, for instance, the block layer gives if two I/O requests go to the same place. > I'm not sure what the fscache rules are. I'm now using PG_fscache in exactly the same way: the previous write to the cache is still under way. I don't want to start another DIO write to the cache for the same pages. Hence the waits/checks on PG_fscache I've added anywhere we need to wait/check on PG_writeback. As I mentioned I'm looking at the possibility of making PG_dirty and PG_writeback cover *both* cases and recording the difference elsewhere - thereby returning PG_private_2 to the VM folks who'd like their bit back. This means, for instance, when we read from the server and find we need to write it to the cache, we set a note in the aforementioned elsewhere, mark the page dirty and leave it to writepages() to effect the write to the cache. It could get tricky because we have two different places to write to, with very different characteristics (e.g. ~6000km away server vs local SSD) with their own queueing, scheduling, bandwidth, etc. - and the local disk might have to share with the system. David