On Fri, Feb 08, 2019 at 02:49:44PM -0800, Andrew Morton wrote: > On Fri, 8 Feb 2019 13:50:49 +0100 Jan Kara <jack@xxxxxxx> wrote: > > > > > Has anyone done significant testing with Rik's maybe-fix? > > > > > > I will give it a spin with bonnie++ today. We'll see what comes out. > > > > OK, I did a bonnie++ run with Rik's patch (on top of 4.20 to rule out other > > differences). This machine does not show so big differences in bonnie++ > > numbers but the difference is still clearly visible. The results are > > (averages of 5 runs): > > > > Revert Base Rik > > SeqCreate del 78.04 ( 0.00%) 98.18 ( -25.81%) 90.90 ( -16.48%) > > RandCreate del 87.68 ( 0.00%) 95.01 ( -8.36%) 87.66 ( 0.03%) > > > > 'Revert' is 4.20 with "mm: don't reclaim inodes with many attached pages" > > and "mm: slowly shrink slabs with a relatively small number of objects" > > reverted. 'Base' is the kernel without any reverts. 'Rik' is a 4.20 with > > Rik's patch applied. > > > > The numbers are time to do a batch of deletes so lower is better. You can see > > that the patch did help somewhat but it was not enough to close the gap > > when files are deleted in 'readdir' order. > > OK, thanks. > > I guess we need a rethink on Roman's fixes. I'll queued the reverts. Agree. I still believe that we should cause the machine-wide memory pressure to clean up any remains of dead cgroups, and Rik's patch is a step into the right direction. But we need to make some experiments and probably some code changes here to guarantee that we don't regress on performance. > > > BTW, one thing I don't think has been discussed (or noticed) is the > effect of "mm: don't reclaim inodes with many attached pages" on 32-bit > highmem machines. Look why someone added that code in the first place: > > : commit f9a316fa9099053a299851762aedbf12881cff42 > : Author: Andrew Morton <akpm@xxxxxxxxx> > : Date: Thu Oct 31 04:09:37 2002 -0800 > : > : [PATCH] strip pagecache from to-be-reaped inodes > : > : With large highmem machines and many small cached files it is possible > : to encounter ZONE_NORMAL allocation failures. This can be demonstrated > : with a large number of one-byte files on a 7G machine. > : > : All lowmem is filled with icache and all those inodes have a small > : amount of highmem pagecache which makes them unfreeable. > : > : The patch strips the pagecache from inodes as they come off the tail of > : the inode_unused list. > : > : I play tricks in there peeking at the head of the inode_unused list to > : pick up the inode again after running iput(). The alternatives seemed > : to involve more widespread changes. > : > : Or running invalidate_inode_pages() under inode_lock which would be a > : bad thing from a scheduling latency and lock contention point of view. > > I guess I shold have added a comment. Doh. > It's a very useful link. Thanks!