On 12/11/19 10:56 AM, Jens Axboe wrote: >> But I think most of the regular IO call chains come through >> "mark_page_accessed()". So _that_ is the part you want to avoid (and >> maybe the workingset code). And that should be fairly straightforward, >> I think. > > Sure, I can give that a go and see how that behaves. Before doing that, I ran a streamed read test instead of just random reads, and the behavior is roughly the same. kswapd consumes a bit less CPU, but it's still very active once the page cache has been filled. For specifics on the setup, I deliberately boot the box with 32G of RAM, and the dataset is 320G. My initial tests were with 1 320G file, but Johannes complained about that so I went to 32 10G files instead. That's what I'm currently using. For the random test case, top of profile for kswapd is: + 33.49% kswapd0 [kernel.vmlinux] [k] xas_create ◆ + 7.93% kswapd0 [kernel.vmlinux] [k] __isolate_lru_page ▒ + 7.18% kswapd0 [kernel.vmlinux] [k] unlock_page ▒ + 5.90% kswapd0 [kernel.vmlinux] [k] free_pcppages_bulk ▒ + 5.64% kswapd0 [kernel.vmlinux] [k] _raw_spin_lock_irqsave ▒ + 5.57% kswapd0 [kernel.vmlinux] [k] shrink_page_list ▒ + 3.48% kswapd0 [kernel.vmlinux] [k] __remove_mapping ▒ + 3.35% kswapd0 [kernel.vmlinux] [k] isolate_lru_pages ▒ + 3.14% kswapd0 [kernel.vmlinux] [k] __delete_from_page_cache ▒ Next I ran with NOT calling mark_page_accessed() to see if that makes a difference. See patch below, I just applied this on top of this patchset and added a new RWF_NOACCESS flag for it for ease of teting. I verified that we are indeed skipping the mark_page_accessed() call in generic_file_buffered_read(). I can't tell a difference in the results, there's no discernable difference between NOT calling mark_page_accessed() or calling it. Behavior seems about the same, in terms of pre and post page cache full, and kswapd still churns a lot once the page cache is filled up. -- Jens Axboe