Re: [PATCH 11/11] mm: Isolate pages for immediate reclaim on their own LRU

Mel Gorman <mgorman@xxxxxxx> · Fri, 16 Dec 2011 16:07:28 +0000

On Fri, Dec 16, 2011 at 04:17:31PM +0100, Johannes Weiner wrote:
> On Wed, Dec 14, 2011 at 03:41:33PM +0000, Mel Gorman wrote:
> > It was observed that scan rates from direct reclaim during tests
> > writing to both fast and slow storage were extraordinarily high. The
> > problem was that while pages were being marked for immediate reclaim
> > when writeback completed, the same pages were being encountered over
> > and over again during LRU scanning.
> > 
> > This patch isolates file-backed pages that are to be reclaimed when
> > clean on their own LRU list.
> 
> Excuse me if I sound like a broken record, but have those observations
> of high scan rates persisted with the per-zone dirty limits patchset?
> 

Unfortunately I wasn't testing that series. The focus of this series
was primarily on THP-related stalls incurred by compaction which
did not have a dependency on that series. Even with dirty balancing,
similar stalls would be observed once dirty pages were in the zone
at all.

> In my tests with pzd, the scan rates went down considerably together
> with the immediate reclaim / vmscan writes.
> 

I probably should know but what is pzd?

> Our dirty limits are pretty low - if reclaim keeps shuffling through
> dirty pages, where are the 80% reclaimable pages?!  To me, this sounds
> like the unfair distribution of dirty pages among zones again.  Is
> there are a different explanation that I missed?
> 

The alternative explanation is that the 20% dirty pages are all
long-lived, at the end of the highest zone which is always scanned first
so we continually have to scan over these dirty pages for prolonged
periods of time. 

> PS: It also seems a bit out of place in this series...?

Without the last path, the System CPU time was stupidly high. In part,
this is because we are no longer calling ->writepage from direct
reclaim. If we were, the CPU usage would be far lower but it would
be a lot slower too. It seemed remiss to leave system CPU usage that
high without some explanation or patch dealing with it.

The following replaces this patch with your series. dirtybalance-v7r1 is
yours.

                   3.1.0-vanilla         rc5-vanilla       freemore-v6r1        isolate-v6r1   dirtybalance-v7r1
System Time         1.22 (    0.00%)   13.89 (-1040.72%)   46.40 (-3709.20%)    4.44 ( -264.37%)   43.05 (-3434.81%)
+/-                 0.06 (    0.00%)   22.82 (-37635.56%)    3.84 (-6249.44%)    6.48 (-10618.92%)    4.04 (-6581.33%)
User Time           0.06 (    0.00%)    0.06 (   -6.90%)    0.05 (   17.24%)    0.05 (   13.79%)    0.05 (   20.69%)
+/-                 0.01 (    0.00%)    0.01 (   33.33%)    0.01 (   33.33%)    0.01 (   39.14%)    0.01 (   -1.84%)
Elapsed Time     10445.54 (    0.00%) 2249.92 (   78.46%)   70.06 (   99.33%)   16.59 (   99.84%)   73.71 (   99.29%)
+/-               643.98 (    0.00%)  811.62 (  -26.03%)   10.02 (   98.44%)    7.03 (   98.91%)   17.90 (   97.22%)
THP Active         15.60 (    0.00%)   35.20 (  225.64%)   65.00 (  416.67%)   70.80 (  453.85%)  102.60 (  657.69%)
+/-                18.48 (    0.00%)   51.29 (  277.59%)   15.99 (   86.52%)   37.91 (  205.18%)   26.06 (  141.02%)
Fault Alloc       121.80 (    0.00%)   76.60 (   62.89%)  155.40 (  127.59%)  181.20 (  148.77%)  214.80 (  176.35%)
+/-                73.51 (    0.00%)   61.11 (   83.12%)   34.89 (   47.46%)   31.88 (   43.36%)   53.21 (   72.39%)
Fault Fallback    881.20 (    0.00%)  926.60 (   -5.15%)  847.60 (    3.81%)  822.00 (    6.72%)  788.40 (   10.53%)
+/-                73.51 (    0.00%)   61.26 (   16.67%)   34.89 (   52.54%)   31.65 (   56.94%)   53.41 (   27.35%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds)       3540.88   1945.37    716.04     64.97    715.04
Total Elapsed Time (seconds)              52417.33  11425.90    501.02    230.95    549.64

Your series does help the System CPU time begining it from 46.4 seconds
to 43.05 seconds. That is within the noise but towards the edge of
one standard deviation. With such a small reduction, elapsed time was
not helped. However, it did help THP allocation success rates - still
within the noise but again at the edge of the noise which indicates
a solid improvement.

MMTests Statistics: vmstat
Page Ins                                  3257266139  1111844061    17263623    10901575    20870385
Page Outs                                   81054922    30364312     3626530     3657687     3665499
Swap Ins                                        3294        2851        6560        4964        6598
Swap Outs                                     390073      528094      620197      790912      604228
Direct pages scanned                      1077581700  3024951463  1764930052   115140570  1796314840
Kswapd pages scanned                        34826043     7112868     2131265     1686942     2093637
Kswapd pages reclaimed                      28950067     4911036     1246044      966475     1319662
Direct pages reclaimed                     805148398   280167837     3623473     2215044     4182274
Kswapd efficiency                                83%         69%         58%         57%         63%
Kswapd velocity                              664.399     622.521    4253.852    7304.360    3809.106
Direct efficiency                                74%          9%          0%          1%          0%
Direct velocity                            20557.737  264745.137 3522673.849  498551.938 3268166.145
Percentage direct scans                          96%         99%         99%         98%         99%
Page writes by reclaim                        722646      529174      620319      791018      604368
Page writes file                              332573        1080         122         106         140
Page writes anon                              390073      528094      620197      790912      604228
Page reclaim immediate                             0  2552514720  1635858848   111281140  1661416934
Page rescued immediate                             0           0           0       87848           0
Slabs scanned                                  23552       23552        9216        8192        8192
Direct inode steals                              231           0           0           0           0
Kswapd inode steals                                0           0           0           0           0
Kswapd skipped wait                            28076         786           0          61           1
THP fault alloc                                  609         383         753         906        1074
THP collapse alloc                                12           6           0           0           0
THP splits                                       536         211         456         593         561
THP fault fallback                              4406        4633        4263        4110        3942
THP collapse fail                                120         127           0           0           0
Compaction stalls                               1810         728         623         779         869
Compaction success                               196          53          60          80          99
Compaction failures                             1614         675         563         699         770
Compaction pages moved                        193158       53545      243185      333457      409585
Compaction move failure                         9952        9396       16424       23676       30668

The direct page scanned figure with your patch is still very high
unfortunately.

Overall, I would say that your series is not a replacement for the last
patch in this series. 

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>