On 12/11/19 1:18 PM, Linus Torvalds wrote: > On Wed, Dec 11, 2019 at 12:08 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> $ cat /proc/meminfo | grep -i active >> Active: 134136 kB >> Inactive: 28683916 kB >> Active(anon): 97064 kB >> Inactive(anon): 4 kB >> Active(file): 37072 kB >> Inactive(file): 28683912 kB > > Yeah, that should not put pressure on some swap activity. We have 28 > GB of basically free inactive file data, and the VM is doing something > very very bad if it then doesn't just quickly free it with no real > drama. > > In fact, I don't think it should even trigger kswapd at all, it should > all be direct reclaim. Of course, some of the mm people hate that with > a passion, but this does look like a prime example of why it should > just be done. For giggles, I ran just a single thread on the file set. We're only doing about 100K IOPS at that point, yet when the page cache fills, kswapd still eats 10% cpu. That seems like a lot for something that slow. > MM people - mind giving this a look? Jens, if you have that NOACCESS > flag in a git tree too and a trivial way to recreate your load, that > would be good for people to be able to just try things out. I've pushed the NOACCESS thing to my buffered-uncached branch as well, and fio has a 'noaccess' branch that enables it for pvsync2 (which is preadv2/pwritev2) and the io_uring engine. Here's what I did to reproduce: - Boot the box with 32G of memory. - On a fast device, create 10x RAM size of files. I used 32 files, each 10G. Mine are in /data, and they are named file[1-32]. - Run a buffered read workload on those files. For pvsync2, something ala: $ cat job.fio [test] ioengine=pvsync2 #uncached=1 #noaccess=1 iodepth=4 bs=4k group_reporting=1 rw=randread norandommap buffered=1 directory=/data filename=file1:file2:file3:file4:file5:file6:file7:file8:file9:file10:file11:file12:file13:file14:file15:file16:file17:file18:file19:file20:file21:file22:file23:file24:file25:file26:file27:file28:file29:file30:file31:file32 If you want to use more than one thread, add: numjobs=4 for 4 threads. Uncomment the 'uncached=1' and/or 'noaccess=1' to enable either RWF_UNCACHED or RWF_NOACCESS. -- Jens Axboe