Re: [PATCH] procfs: expose page cache contents

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 29 Apr 2013 11:38:05 +1000

On Sun, Apr 28, 2013 at 12:05:47AM +0000, Nick White wrote:
> > what use does this information have?
> 
> There are two main ways I'd find this data (as distinct from this format)
> useful:
> 
> Some applications would benefit from knowing which files are cheaper to
> access. A good example would be a database's query planner, when deciding
> whether to use an index or just sequentially scan a table. If the table's
> blocks were resident in memory but the index's weren't, then it might be
> faster just to start scan the table.

Sounds like a severe case of premature optimisation to me. Indeed,
most databases use direct IO, so there aren't any cached pages in
kernel memory, so nothing you do here will tell you anything about
what is the best query method.

> While mmap / mincore'ing the files would provide this information
> for a specific file, when the size of the files you're interested
> in exceed the address space available (admittedly unlike on 64-bit
> machines, but easy on 32-bit machines) you'd have to start
> processing the files in chunks; this would take much longer and so
> increase the accuracy problems you highlight.

And points out the silliness of attempting to use "what is cached"
as a method of determining the best algorithm to use - it simply
doesn't scale up. Further, if you optimise towards whatever method
gives the best physical IO patterns you'll end up with the most
robust and consistently performing solution.

There's nothing more irritating than a database that randomly
changes performance on the same workload for no obvious reason....

> This scenario actually highlights an algorithmic problem with my
> solution - it loops through the inodes of each (block-device)
> super-block, querying if any of their pages are resident.

Well, yes. Think of a machine with a couple of TB of RAM and tens of
millions of cached inodes....

> It'd be far more efficient to look through the resident pages, and
> see which inodes they pointed at (if any), possibly by walking
> through the memory zones (like /proc/zoneinfo), iterating over the
> per_cpu_pages and mapping them to inodes (if applicable) via
> page->mapping->host?

That doesn't make the TB of page cache case any better - it's just
as gross as your current patch....

> The other use-case I had in mind was when profiling existing
> processes that either use memory-mapping or otherwise rely on the
> kernel to cache the data they frequently rely on.

Go google for the recent hot data tracking patch series.

> I understand your concerns, but I believe more transparency around
> what the page cache is doing would be useful due to its
> significant impact on a system's performance.

You don't need to scan the page cache to understand what it is
doing. strace will tell you the IO your application is doing,
blktrace will tell you the IO that the page cache is doing, various
tracepoints will tell you what pages are being reclaimed, etc.  If
this isn't sufficient for you to understand what your application is
doing and you really need fine grained, custom information about
what is cached in the page cache, then perhaps systemtap would be a
better solution for your purposes.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html