On Fri, 8 Feb 2019 13:50:49 +0100 Jan Kara <jack@xxxxxxx> wrote: > > > Has anyone done significant testing with Rik's maybe-fix? > > > > I will give it a spin with bonnie++ today. We'll see what comes out. > > OK, I did a bonnie++ run with Rik's patch (on top of 4.20 to rule out other > differences). This machine does not show so big differences in bonnie++ > numbers but the difference is still clearly visible. The results are > (averages of 5 runs): > > Revert Base Rik > SeqCreate del 78.04 ( 0.00%) 98.18 ( -25.81%) 90.90 ( -16.48%) > RandCreate del 87.68 ( 0.00%) 95.01 ( -8.36%) 87.66 ( 0.03%) > > 'Revert' is 4.20 with "mm: don't reclaim inodes with many attached pages" > and "mm: slowly shrink slabs with a relatively small number of objects" > reverted. 'Base' is the kernel without any reverts. 'Rik' is a 4.20 with > Rik's patch applied. > > The numbers are time to do a batch of deletes so lower is better. You can see > that the patch did help somewhat but it was not enough to close the gap > when files are deleted in 'readdir' order. OK, thanks. I guess we need a rethink on Roman's fixes. I'll queued the reverts. BTW, one thing I don't think has been discussed (or noticed) is the effect of "mm: don't reclaim inodes with many attached pages" on 32-bit highmem machines. Look why someone added that code in the first place: : commit f9a316fa9099053a299851762aedbf12881cff42 : Author: Andrew Morton <akpm@xxxxxxxxx> : Date: Thu Oct 31 04:09:37 2002 -0800 : : [PATCH] strip pagecache from to-be-reaped inodes : : With large highmem machines and many small cached files it is possible : to encounter ZONE_NORMAL allocation failures. This can be demonstrated : with a large number of one-byte files on a 7G machine. : : All lowmem is filled with icache and all those inodes have a small : amount of highmem pagecache which makes them unfreeable. : : The patch strips the pagecache from inodes as they come off the tail of : the inode_unused list. : : I play tricks in there peeking at the head of the inode_unused list to : pick up the inode again after running iput(). The alternatives seemed : to involve more widespread changes. : : Or running invalidate_inode_pages() under inode_lock which would be a : bad thing from a scheduling latency and lock contention point of view. I guess I shold have added a comment. Doh.