Re: [RFC PATCH v1 00/13] lru_lock scalability

Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> · Tue, 13 Feb 2018 16:07:19 -0500

On 02/08/2018 06:36 PM, Andrew Morton wrote:
On Wed, 31 Jan 2018 18:04:00 -0500 daniel.m.jordan@xxxxxxxxxx wrote:

lru_lock, a per-node* spinlock that protects an LRU list, is one of the
hottest locks in the kernel.  On some workloads on large machines, it
shows up at the top of lock_stat.

Do you have details on which callsites are causing the problem?  That
would permit us to consider other approaches, perhaps.

Sure, there are two paths where we're seeing contention.

In the first one, a pagevec's worth of anonymous pages are added to 
various LRUs when the per-cpu pagevec fills up:

  /* take an anonymous page fault, eventually end up at... */
  handle_pte_fault
    do_anonymous_page
      lru_cache_add_active_or_unevictable
        lru_cache_add
          __lru_cache_add
            __pagevec_lru_add
              pagevec_lru_move_fn
                /* contend on lru_lock */

In the second, one or more pages are removed from an LRU under one hold 
of lru_lock:

  // userland calls munmap or exit, eventually end up at...
  zap_pte_range
    __tlb_remove_page // returns true because we eventually hit
                      // MAX_GATHER_BATCH_COUNT in tlb_next_batch
    tlb_flush_mmu_free
      free_pages_and_swap_cache
        release_pages
          /* contend on lru_lock */

For a broader context, we've run decision support benchmarks where 
lru_lock (and zone->lock) show long wait times. But we're not the only 
ones according to certain kernel comments:

mm/vmscan.c:
 * zone_lru_lock is heavily contended.  Some of the functions that
 * shrink the lists perform better by taking out a batch of pages
 * and working on them outside the LRU lock.
 *
 * For pagecache intensive workloads, this function is the hottest
 * spot in the kernel (apart from copy_*_user functions).
...
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,

include/linux/mmzone.h:
 * zone->lock and the [pgdat->lru_lock] are two of the hottest locks in 
the kernel.
 * So add a wild amount of padding here to ensure that they fall into 
separate
 * cachelines. ...

Anyway, if you're seeing this lock in your workloads, I'm interested in 
hearing what you're running so we can get more real world data on this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>