On 14.01.2019 02:12, Baptiste Lepers wrote: > On Sat, Jan 12, 2019 at 4:53 AM Daniel Jordan > <daniel.m.jordan@xxxxxxxxxx> wrote: >> >> On Fri, Jan 11, 2019 at 02:59:38PM +0100, Michal Hocko wrote: >>> On Fri 11-01-19 16:52:17, Baptiste Lepers wrote: >>>> Hello, >>>> >>>> We have a performance issue with the page cache. One of our workload >>>> spends more than 50% of it's time in the lru_locks called by >>>> shrink_inactive_list in mm/vmscan.c. >>> >>> Who does contend on the lock? Are there direct reclaimers or is it >>> solely kswapd with paths that are faulting the new page cache in? >> >> Yes, and could you please post your performance data showing the time in >> lru_lock? Whatever you have is fine, but using perf with -g would give >> callstacks and help answer Michal's question about who's contending. > > Thanks for the quick answer. > > The time spent in the lru_lock is mainly due to direct reclaimers > (reading an mmaped page that causes some readahead to happen). We have > tried to play with readahead values, but it doesn't change performance > a lot. We have disabled swap on the machine, so kwapd doesn't run. > > Our programs run in memory cgroups, but I don't think that the issue > directly comes from cgroups (I might be wrong though). > > Here is the callchain that I have using perf report --no-children; > (Paste here https://pastebin.com/151x4QhR ) > > 44.30% swapper [kernel.vmlinux] [k] intel_idle > # The machine is idle mainly because it waits in that lru_locks, > which is the 2nd function in the report: > 10.98% testradix [kernel.vmlinux] [k] native_queued_spin_lock_slowpath > |--10.33%--_raw_spin_lock_irq > | | > | --10.12%--shrink_inactive_list > | shrink_node_memcg > | shrink_node > | do_try_to_free_pages > | try_to_free_mem_cgroup_pages > | try_charge > | mem_cgroup_try_charge > | __add_to_page_cache_locked > | add_to_page_cache_lru > | | > | |--5.39%--ext4_mpage_readpages > | | ext4_readpages > | | __do_page_cache_readahead > | | | > | | --5.37%--ondemand_readahead > | | > page_cache_async_readahead Does MADV_RANDOM make the trace better or worse? > | | filemap_fault > | | ext4_filemap_fault > | | __do_fault > | | handle_pte_fault > | | __handle_mm_fault > | | handle_mm_fault > | | __do_page_fault > | | do_page_fault > | | page_fault > | | | > | | |--4.23%-- <our app> > > > Thanks, > > Baptiste. > > > > > > >> >> Happy to help profile and debug offline. >