Re: [PATCH 0/4] pagecache scanning with /proc/kpagecache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 22, 2014 at 01:36:32PM +0300, Kirill A. Shutemov wrote:
> On Thu, May 22, 2014 at 01:50:22PM +0400, Konstantin Khlebnikov wrote:
> > On Thu, May 22, 2014 at 6:33 AM, Andrew Morton
> > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > On Wed, 21 May 2014 22:19:55 -0400 Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> wrote:
> > >
> > >> > A much nicer interface would be for us to (finally!) implement
> > >> > fincore(), perhaps with an enhanced per-present-page payload which
> > >> > presents the info which you need (although we don't actually know what
> > >> > that info is!).
> > >>
> > >> page/pfn of each page slot and its page cache tag as shown in patch 4/4.
> > >>
> > >> > This would require open() - it appears to be a requirement that the
> > >> > caller not open the file, but no reason was given for this.
> > >> >
> > >> > Requiring open() would address some of the obvious security concerns,
> > >> > but it will still be possible for processes to poke around and get some
> > >> > understanding of the behaviour of other processes.  Careful attention
> > >> > should be paid to this aspect of any such patchset.
> > >>
> > >> Sorry if I missed your point, but this interface defines fixed mapping
> > >> between file position in /proc/kpagecache and in-file page offset of
> > >> the target file. So we do not need to use seq_file mechanism, that's
> > >> why open() is not defined and default one is used.
> > >> The same thing is true for /proc/{kpagecount,kpageflags}, from which
> > >> I copied/pasted some basic code.
> > >
> > > I think you did miss my point ;) Please do a web search for fincore -
> > > it's a syscall similar to mincore(), only it queries pagecache:
> > > fincore(int fd, loff_t offset, ...).  In its simplest form it queries
> > > just for present/absent, but we could increase the query payload to
> > > incorporate additional per-page info.
> > >
> > > It would take a lot of thought and discussion to nail down the
> > > fincore() interface (we've already tried a couple of times).  But
> > > unfortunately, fincore() is probably going to be implemented one day
> > > and it will (or at least could) make /proc/kpagecache obsolete.
> > >
> > 
> > It seems fincore() also might obsolete /proc/kpageflags and /proc/pid/pagemap.
> > because it might be implemented for /dev/mem and /proc/pid/mem as well
> > as for normal files.
> 
> > Something like this:
> > int fincore(int fd, u64 *kpf, u64 *pfn, size_t length, off_t offset)
> 
> As always with new syscalls flags are missing ;)
> 
> u64 for kpf doesn't sound future proof enough. What about this:
> 
> int fincore(int fd, size_t length, off_t offset,
> 	unsigned long flags, void *records);
> 
> Format of records is defined by what user asks in flags. Like:
> 
>  - FINCORE_PFN: records are 64-bit each with pfn;
>  - FINCORE_PAGE_FLAGS: records are 64-bit each with flags;

I hope that the flags we get from this mode contains pagecache tag info
as well as KPF_*.

>  - FINCORE_PFN | FINCORE_PAGE_FLAGS: records are 128-bit each with pfns
>    followed by flags (or vice versa);
> 
> New flags can extend the format if we would want to expose more info.
> 
> Comments?

Maybe mincore()-like bitmap mode (FINCORE_BMAP) is also helpful who wants
minimum memory footprint?

Anyway I like this extensible interface you're suggesting.

> BTW, does everybody happy with mincore() interface? We report 1 there if
> pte is present, but it doesn't really say much about the page for cases
> like zero page...

According to manpage of mincore(2), 
  mincore()  returns a vector that indicates whether pages of the calling process's vir‐
  tual memory are resident in core (RAM), and so will not  cause  a  disk  access  (page
  fault) if referenced.  ...

so we can assume that the callers want to predict whether they will have
page faults. But it depends on whether the access is read or write.
So I think current mincore() is not enough to do this prediction precisely
for privately shared pages (including zero page and ksm page).
Maybe we need a new syscall to solving this problem.

Thanks,
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]