Re: [patch 1/2] mm: fincore()

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Fri, 15 Feb 2013 15:42:35 -0800

On Fri, 15 Feb 2013 18:13:04 -0500
Johannes Weiner <hannes@xxxxxxxxxxx> wrote:

> On Fri, Feb 15, 2013 at 01:27:38PM -0800, Andrew Morton wrote:
> > On Fri, 15 Feb 2013 01:34:50 -0500
> > Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > 
> > > + * The status is returned in a vector of bytes.  The least significant
> > > + * bit of each byte is 1 if the referenced page is in memory, otherwise
> > > + * it is zero.
> > 
> > Also, this is going to be dreadfully inefficient for some obvious cases.
> > 
> > We could address that by returning the info in some more efficient
> > representation.  That will be run-length encoded in some fashion.
> > 
> > The obvious way would be to populate an array of
> > 
> > struct page_status {
> > 	u32 present:1;
> > 	u32 count:31;
> > };
> > 
> > or whatever.
> 
> I'm having a hard time seeing how this could be extended to more
> status bits without stifling the optimization too much.

See other email: add a syscall arg which specifies the boolean status
which we're searching for.

>  If we just
> add more status bits to one page_status, the likelihood of long runs
> where all bits are in agreement decreases.  But as the optimization
> becomes less and less effective, we are stuck with an interface that
> is more PITA than just using mmap and mincore again.
> 
> The user has to supply a worst-case-sized vector with one struct
> page_status per page in the range, but the per-page item will be
> bigger than with the byte vector because of the additional run length
> variable.

Yes, we'd need to tell the kernel how much storage is available for the
structures.

> However, one struct page_status per run leaves you with a worst case
> of one syscall per page in the range.

Yes.

> I dunno.  The byte vector might not be optimal but its worst cases
> seem more attractive, is just as extensible, and dead simple to use.

But I think "which pages from this 4TB file are in core" will not be an
uncommon usage, and writing a gig of memory to find three pages is just
awful.

I wonder what the most common usage would be (one should know this
before merging the syscall :)).  I guess "is this relatively-small
range of the file in core" and/or "which pages from this
relatively-small range of the file will I need to read", etc.

The syscall should handle the common usages very well.  But it
shouldn't handle uncommon usages very badly!
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html