Re: [ATTEND][LSF/VM TOPIC] deterministic cgroup charging using file path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 28 Jun 2010 22:31:03 -0700
Greg Thelen <gthelen@xxxxxxxxxx> wrote:

> On Sun, Jun 27, 2010 at 7:03 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > On Fri, 25 Jun 2010 13:43:45 -0700
> > Greg Thelen <gthelen@xxxxxxxxxx> wrote:

> >> /dev/cgroup/cg1/cg11  # T1: want memory.limit = 30MB
> >> /dev/cgroup/cg1/cg12  # T2: want memory.limit = 100MB
> >> /dev/cgroup/cg1       # want memory.limit = 1GB + 30MB + 100MB
> >>
> >> I have implemented a prototype that allows a file system hierarchy be charge a
> >> particular cgroup using a new bind mount option:
> >> + mount -t cgroup none /cgroup -o memory
> >> + mount --bind /tmp/db /tmp/db -o cgroup=/dev/cgroup/cg1
> >>
> >> Any accesses to files within /tmp/db are charged to /dev/cgroup/cg1.  Access to
> >> other files behave normally - they charge the cgroup of the current task.
> >>
> >
> > Interesting, but I want to use madvice() etc..for this kind of jobs, rather than
> > deep hooks into the kernel.
> >
> > madvise(addr, size, MEMORY_RECHAEGE_THIS_PAGES_TO_ME);
> >
> > Then, you can write a command as:
> >
> >  file_recharge [path name] [cgroup]
> >  - this commands move a file cache to specified cgroup.
> >
> > A daemon program which uses this command + inotify will give us much
> > flexible controls on file cache on memcg. Do you have some requirements
> > that this move-charge shouldn't be done in lazy manner ?
> >
> > Status:
> > We have codes for move-charge, inotify but have no code for new madvise.
> >
> >
> > Thanks,
> > -Kame
> 
> This is an interesting approach.  I like the idea of minimizing kernel
> changes.  I want to make sure I understand the idea using terms from
> my above example.
> 
> 1. The daemon establishes inotify() watches on /tmp/db and all sub
> directories to catch any accesses.
> 
> 2. If cg11(T1) is the first process to mmap a portion of a /tmp/db
> file (pages_1) then cg11 will be charged.  T1 will not use madvise()
> because cg11 does not want to be charged.  cg11 will be temporarily
> charged for pages_1.
> 
yes.

> 3. inotify() will inform the proposed daemon that T1 opened /tmp/db,
> so the daemon will use file_recharge, which runs the following within
> the cg1 cgroup:
> - fd = open("/tmp/db/.../path_to_file")
> - va = mmap(NULL, size=stat(fd).st_size, fd)
> - madvise(fd, va, st_size, MEMORY_RECHARGE_THIS_PAGES_TO_ME).  This
> will move the charge of pages_1 from cg11 to cg1.
> 
> Did I state this correctly?
> 
yes.


> I am concerned that the follow-on step does not move the pages to cg1:
> 4. T1 then touches more /tmp/db pages (pages_2) using the same mmap.
> This charges cg11.  I assume that inotify() would not notify the
> daemon for this case because the file is still open. 
you're right.

> So the pages will not be moved to cg1.  Or are you suggesting
> that inotify() enhanced to advertise charge events?

IIUC, now, inotify() doesn't support mmap. But it has read/write notification.
So, let's think about mmapped pages.

For easy implementation, I suggest file_recharge should map the whole file
and move them all under it. But maybe this is an answer you want.

If I write an _easy_ daemon, which will do...

==
  register inotify and add watches.
  The wathces will see OPEN and IN_DELETE_SELF.

  run 2 threads.

Thread1:
  while(1) {
      read() // check events from inotify.
      maintain opened-file information.
  }

Thread2:
  while (1) {
      check opend-file information.
      select a file // you may implement some scheduling, here.
      open,
      mmap
      mincore() .... checks the file is cached.
      madvice() 
      // if you want, touch pages and add Access bit to them.
      close(),

      sleep if necessary.
 }
==
batch-style cron-job rather than sleep will not be very bad for usual use.
But we may need some interface to implement something clever algorithm.


> If the number of directories within /tmp/db is large, then inotify()
> maybe expensive.  I don't think this is a problem.
> 
> Another worry I have is that if for some reason the daemon is started
> after the job, or if the daemon crashes and is restarted, then files
> may have been opened and charged to cg11 without the inotify being
> setup. 
yes.

> The daemon would have problems finding the pages that were
> charged to cg11 and need to be moved to cg1.  The daemon could scan
> the open file table of T1, but any files that are no longer opened may
> be charged to cg11 with no way for the daemon to find them.
> 

Above thread-1 can maintain "opened-file" database.
Or you can run a recovery-scirpt to open /proc/<xxxx>/fd of processes
to trigger OPEN events.

But yes, some in-kernel approach may be required. as...new interface to memcg
rather than madvise.

/memory.move_file_caches
- when you open this and write()/ioctl() file descriptor to this file,
  all on-memory pages of files will be moved to this cgroup.

Hmm...we may be able to add an interface to know last-pagecache-update time.
(Because access-time is tend to be omitted at mount....)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]