On 12/11/19 12:34 PM, Jens Axboe wrote: > On 12/11/19 10:56 AM, Jens Axboe wrote: >>> But I think most of the regular IO call chains come through >>> "mark_page_accessed()". So _that_ is the part you want to avoid (and >>> maybe the workingset code). And that should be fairly straightforward, >>> I think. >> >> Sure, I can give that a go and see how that behaves. > > Before doing that, I ran a streamed read test instead of just random > reads, and the behavior is roughly the same. kswapd consumes a bit less > CPU, but it's still very active once the page cache has been filled. For > specifics on the setup, I deliberately boot the box with 32G of RAM, and > the dataset is 320G. My initial tests were with 1 320G file, but > Johannes complained about that so I went to 32 10G files instead. That's > what I'm currently using. > > For the random test case, top of profile for kswapd is: > > + 33.49% kswapd0 [kernel.vmlinux] [k] xas_create > + 7.93% kswapd0 [kernel.vmlinux] [k] __isolate_lru_page > + 7.18% kswapd0 [kernel.vmlinux] [k] unlock_page > + 5.90% kswapd0 [kernel.vmlinux] [k] free_pcppages_bulk > + 5.64% kswapd0 [kernel.vmlinux] [k] _raw_spin_lock_irqsave > + 5.57% kswapd0 [kernel.vmlinux] [k] shrink_page_list > + 3.48% kswapd0 [kernel.vmlinux] [k] __remove_mapping > + 3.35% kswapd0 [kernel.vmlinux] [k] isolate_lru_pages > + 3.14% kswapd0 [kernel.vmlinux] [k] __delete_from_page_cache Here's the profile for the !mark_page_accessed() run, looks very much the same: + 32.84% kswapd0 [kernel.vmlinux] [k] xas_create + 8.05% kswapd0 [kernel.vmlinux] [k] unlock_page + 7.68% kswapd0 [kernel.vmlinux] [k] __isolate_lru_page + 6.08% kswapd0 [kernel.vmlinux] [k] free_pcppages_bulk + 5.96% kswapd0 [kernel.vmlinux] [k] _raw_spin_lock_irqsave + 5.56% kswapd0 [kernel.vmlinux] [k] shrink_page_list + 4.02% kswapd0 [kernel.vmlinux] [k] __remove_mapping + 3.70% kswapd0 [kernel.vmlinux] [k] __delete_from_page_cache + 3.55% kswapd0 [kernel.vmlinux] [k] isolate_lru_pages -- Jens Axboe