Hi Andrew,
One common use case where this is really helpful is in data analytics. Assume that you regularly analyze some chunk of data, say one month's worth, and you run SQL queries or MapReduce jobs on this data. Let's also assume you want to serve the current month's data from memory.Going with an example, let's say data for March takes 60% of total memory. You run queries over that data, and it gets pulled into the active list. Comes next month, you want to query April's data (which again holds 60% of memory). Since analytic queries sequentially walk over data, April's data never becomes active, doesn't get pulled into memory, and you're stuck with serving queries from disk.
To overcome this issue, you could regularly drop the page cache, or advise customers to provision clusters whose cumulative memory is 2x the working set. Neither are that ideal. My understanding is that this patch resolves this issue, but then again my knowledge of the Linux memory manager is pretty limited. So please call off if I'm off here.
To overcome this issue, you could regularly drop the page cache, or advise customers to provision clusters whose cumulative memory is 2x the working set. Neither are that ideal. My understanding is that this patch resolves this issue, but then again my knowledge of the Linux memory manager is pretty limited. So please call off if I'm off here.
Thanks,
Ozgun
On Fri, Aug 9, 2013 at 3:53 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
On Tue, 6 Aug 2013 18:44:01 -0400 Johannes Weiner <hannes@xxxxxxxxxxx> wrote:Looks nice. The lack of testing results is conspicuous ;)
> This series solves the problem by maintaining a history of pages
> evicted from the inactive list, enabling the VM to tell streaming IO
> from thrashing and rebalance the page cache lists when appropriate.
It only really solves the problem in the case where
size-of-inactive-list < size-of-working-set < size-of-total-memory
yes? In fact less than that, because the active list presumably
doesn't get shrunk to zero (how far *can* it go?). I wonder how many
workloads fit into those constraints in the real world.