RE: swapcache size oddness

Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> · Sat, 28 Apr 2012 09:48:55 -0700 (PDT)

> From: Hugh Dickins [mailto:hughd@xxxxxxxxxx]
> Subject: Re: swapcache size oddness

Hi Hugh --

Thanks for your, as usual, quick and thorough response!

> On Fri, 27 Apr 2012, Dan Magenheimer wrote:
> 
> > In continuing digging through the swap code (with the
> > overall objective of improving zcache policy), I was
> > looking at the size of the swapcache.
> >
> > My understanding was that the swapcache is simply a
> > buffer cache for pages that are actively in the process
> > of being swapped in or swapped out.
> 
> It's that part of the pagecache for pages on swap.
> 
> Once written out, as with other pagecache pages written out under
> reclaim, we do expect to reclaim them fairly soon (they're moved to
> the bottom of the inactive list).  But when read back in, we read a
> cluster at a time, hoping to pick up some more useful pages while the
> disk head is there (though of course it may be a headless disk).  We
> don't disassociate those from swap until they're dirtied (or swap
> looks fullish), why should we?

OK.  Yes, I forgot about the pages that are swapped in
"speculatively" rather than on demand.  This will certainly
result in an increase in the size of the swapcache (especially
with Rik's recent change that increases the average effective
cluster size).

> > And keeping pages
> > around in the swapcache is inefficient because every
> > process access to a page in the swapcache causes a
> > minor page fault.
> 
> What's inefficient about that?  A minor fault is much less
> costly than the major fault of reading them back from disk.

Yes, but a minor fault is much more costly than a read/write.
I guess I was under the mistaken assumption that a page in
the swapcache can never be directly accessed because the
page table would always have it marked as non-present,
in order to avoid races due to multiple process accesses
and I/O.  But I think I see how that is avoided now (at
least for non-shared-memory pages).

> > So I was surprised to see that, under a memory intensive
> > workload, the swapcache can grow quite large.  I have
> > seen it grow to almost half of the size of RAM.
> 
> Nothing wrong with that, so long as they can be freed and
> used for better purpose when needed.

Due to my mistaken assumption above, I thought a page
in the swap cache was "worse" than a normal anonymous
page (i.e. for system performance).

So really the primary difference between an anonymous page
that is NOT in the swap cache, and an anonymous page
that IS in the swap cache, is that the latter already has
a slot reserved on the swap disk.  (Flags and mapping
differences too of course.)

> > Digging into this oddity, I re-discovered the definition
> > for "vm_swap_full()" which, in scan_swap_map() is a
> > pre-condition for calling __try_to_reclaim_swap().
> > But vm_swap_full() compares how much free swap space
> > there is "on disk", with the total swap space available
> > "on disk" with no regard to how much RAM there is.
> > So on my system, which is running with 1GB RAM and
> > 10GB swap, I think this is the reason that swapcache
> > is growing so large.
> >
> > Am I misunderstanding something?  Or is this code
> > making some (possibly false) assumptions about how
> > swap is/should be sized relative to RAM?  Or maybe the
> > size of swapcache is harmless as long as it doesn't
> > approach total "on disk" size?
> 
> The size of swapcache is harmless: we break those pages' association
> with swap once a better use for the page comes up.  But the size of
> swapcache does (of course) represent a duplication of what's on swap.
> 
> As swap becomes full, that duplication becomes wasteful: we may need
> some of the swap already in memory for saving other pages; so break
> the association, freeing the swap for reuse but keeping the page
> (but now it's no longer swapcache).
> 
> That's what the vm_swap_full() tests are about: choosing to free swap
> when it's duplicated in memory, once it's becoming a scarce resource.

Got it.  Thanks!

Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href