On Thu, Apr 25, 2013 at 09:37:07PM +0300, Alexey Lyahkov wrote: > Mel, > > > On Apr 25, 2013, at 17:30, Mel Gorman wrote: > > > On Wed, Apr 24, 2013 at 10:26:50AM -0400, Theodore Ts'o wrote: > >> On Tue, Apr 23, 2013 at 03:00:08PM -0700, Andrew Morton wrote: > >>> That should fix things for now. Although it might be better to just do > >>> > >>> mark_page_accessed(page); /* to SetPageReferenced */ > >>> lru_add_drain(); /* to SetPageLRU */ > >>> > >>> Because a) this was too early to decide that the page is > >>> super-important and b) the second touch of this page should have a > >>> mark_page_accessed() in it already. > >> > >> The question is do we really want to put lru_add_drain() into the ext4 > >> file system code? That seems to pushing some fairly mm-specific > >> knowledge into file system code. I'll do this if I have to do, but > >> wouldn't be better if this was pushed into mark_page_accessed(), or > >> some other new API was exported by the mm subsystem? > >> > > > > I don't think we want to push lru_add_drain() into the ext4 code. It's > > too specific of knowledge just to work around pagevecs. Before we rework > > how pagevecs select what LRU to place a page, can we make sure that fixing > > that will fix the problem? > > > what is "that"? puting lru_add_drain() in ext4 core? sure that is fixes problem with many small reads during large write. > originally i have put shake_page() in ext4 code, but that have call lru_add_drain_all() so to exaggerated. > No, I would prefer if this was not fixed within ext4. I need confirmation that fixing mark_page_accessed() addresses the performance problem you encounter. The two-line check for PageLRU() followed by a lru_add_drain() is meant to check that. That is still not my preferred fix because even if you do not encounter higher LRU contention, other workloads would be at risk. The likely fix will involve converting pagevecs to using a single list and then selecting what LRU to put a page on at drain time but I want to know that it's worthwhile. Using shake_page() in ext4 is certainly overkill. > > Andrew, can you try the following patch please? Also, is there any chance > > you can describe in more detail what the workload does? > > lustre OSS node + IOR with file size twice more then OSS memory. > Ok, no way I'll be reproducing that workload. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>