On Mon, Apr 04, 2016 at 02:22:33PM -0700, Andrew Morton wrote: > On Mon, 4 Apr 2016 13:13:37 -0400 Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > Andres Freund observed that his database workload is struggling with > > the transaction journal creating pressure on frequently read pages. > > > > Access patterns like transaction journals frequently write the same > > pages over and over, but in the majority of cases those pages are > > never read back. There are no caching benefits to be had for those > > pages, so activating them and having them put pressure on pages that > > do benefit from caching is a bad choice. > > Read-after-write is a pretty common pattern: temporary files for > example. What are the opportunities for regressions here? The read(s) following the write will call mark_page_accessed() and so promote the pages if their data is in fact repeatedly accessed. That makes sense, because the writes really don't say anything about the cache-worthiness. One write followed by one read shouldn't mean the data is strongly benefiting from being cached. Only multiple reads. What complicates that a little bit is that when the multiple reads do happen on write-instantiated pages, the pages might have already been aged somewhat in between, whereas fresh-faulting reads start counting accesses from the head of the LRU right away. If both have re-use distances shorter than memory, the LRU offset of pages instantiated by writes could push the second access past eviction. In that case, they would likely get picked up by refault detection and promoted after all. So it would be one more IO, but nothing permanent. This is also somewhat compensated by the dirty cache delaying reclaim and giving these pages another round-trip anyway - unless dirty limits cause the pages to be written back before they reach the LRU tail. It's really hard to tell whether that would even be an issue since it depends on whether a workload matching those parameters even exist. A synthetic test doesn't really say us much about that. I think all we can do here is decide whether the cache semantics make logical sense. One thing I proposed in the thread that would compensate for the LRU offset of write-instantiated pages would be to set PageReferenced on these pages but never call mark_page_accessed() from the write. This wouldn't be perfect because the distance between write and read does not necessarily predict the distance between the subsequent reads, but it would mean that the first read would promote the pages, whereas repeatedly written files would never be activated or refault-activate. Would that make sense? Is there something I'm missing? > Did you consider providing userspace with a way to hint "this file is > probably write-then-not-read"? Yes, but I'm not too confident in that working out :( -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>