[patch 048/159] mm: swap: check if swap backing device is congested or not

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
Subject: mm: swap: check if swap backing device is congested or not

Swap readahead would read in a few pages regardless if the underlying
device is busy or not.  It may incur long waiting time if the device is
congested, and it may also exacerbate the congestion.

Use inode_read_congested() to check if the underlying device is busy or
not like what file page readahead does.  Get inode from swap_info_struct. 
Although we can add inode information in swap_address_space
(address_space->host), it may lead some unexpected side effect, i.e.  it
may break mapping_cap_account_dirty().  Using inode from swap_info_struct
seems simple and good enough.

Just does the check in vma_cluster_readahead() since swap_vma_readahead()
is just used for non-rotational device which much less likely has
congestion than traditional HDD.

Although swap slots may be consecutive on swap partition, it still may be
fragmented on swap file.  This check would help to reduce excessive stall
for such case.

The test with page_fault1 of will-it-scale (sometimes tracing may just
show runtest.py that is the wrapper script of page_fault1), which
basically launches NR_CPU threads to generate 128MB anonymous pages for
each thread, on my virtual machine with congested HDD shows long tail
latency is reduced significantly.

Without the patch
 page_fault1_thr-1490  [023]   129.311706: funcgraph_entry:      #57377.796 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.369103: funcgraph_entry:        5.642us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.369119: funcgraph_entry:      #1289.592 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.370411: funcgraph_entry:        4.957us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.370419: funcgraph_entry:        1.940us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.378847: funcgraph_entry:      #1411.385 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.380262: funcgraph_entry:        3.916us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.380275: funcgraph_entry:      #4287.751 us |  do_swap_page();

With the patch
      runtest.py-1417  [020]   301.925911: funcgraph_entry:      #9870.146 us |  do_swap_page();
      runtest.py-1417  [020]   301.935785: funcgraph_entry:        9.802us   |  do_swap_page();
      runtest.py-1417  [020]   301.935799: funcgraph_entry:        3.551us   |  do_swap_page();
      runtest.py-1417  [020]   301.935806: funcgraph_entry:        2.142us   |  do_swap_page();
      runtest.py-1417  [020]   301.935853: funcgraph_entry:        6.938us   |  do_swap_page();
      runtest.py-1417  [020]   301.935864: funcgraph_entry:        3.765us   |  do_swap_page();
      runtest.py-1417  [020]   301.935871: funcgraph_entry:        3.600us   |  do_swap_page();
      runtest.py-1417  [020]   301.935878: funcgraph_entry:        7.202us   |  do_swap_page();

[akpm@xxxxxxxxxxxxxxxxxxxx: code cleanup]
[yang.shi@xxxxxxxxxxxxxxxxx: add comment]
  Link: http://lkml.kernel.org/r/bbc7bda7-62d0-df1a-23ef-d369e865bdca@xxxxxxxxxxxxxxxxx
Link: http://lkml.kernel.org/r/1546543673-108536-1-git-send-email-yang.shi@xxxxxxxxxxxxxxxxx
Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
Acked-by: Tim Chen <tim.c.chen@xxxxxxxxx>
Reviewed-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Huang Ying <ying.huang@xxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx
Cc: Jens Axboe <axboe@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/swap_state.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/mm/swap_state.c~mm-swap-check-if-swap-backing-device-is-congested-or-not
+++ a/mm/swap_state.c
@@ -543,6 +543,13 @@ struct page *swap_cluster_readahead(swp_
 	if (!mask)
 		goto skip;
 
+	/* Test swap type to make sure the dereference is safe */
+	if (likely(si->flags & (SWP_BLKDEV | SWP_FS))) {
+		struct inode *inode = si->swap_file->f_mapping->host;
+		if (inode_read_congested(inode))
+			goto skip;
+	}
+
 	do_poll = false;
 	/* Read a page_cluster sized and aligned cluster around offset. */
 	start_offset = offset & ~mask;
_



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux