Re: [PATCH] add some drop_caches documentation and info messsge

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Tue, 23 Oct 2012 16:45:46 -0700

On Fri, 12 Oct 2012 14:57:08 +0200
Michal Hocko <mhocko@xxxxxxx> wrote:

> Hi,
> I would like to resurrect the following Dave's patch. The last time it
> has been posted was here https://lkml.org/lkml/2010/9/16/250 and there
> didn't seem to be any strong opposition. 
> Kosaki was worried about possible excessive logging when somebody drops
> caches too often (but then he claimed he didn't have a strong opinion
> on that) but I would say opposite. If somebody does that then I would
> really like to know that from the log when supporting a system because
> it almost for sure means that there is something fishy going on. It is
> also worth mentioning that only root can write drop caches so this is
> not an flooding attack vector.
> I am bringing that up again because this can be really helpful when
> chasing strange performance issues which (surprise surprise) turn out to
> be related to artificially dropped caches done because the admin thinks
> this would help...
> 
> I have just refreshed the original patch on top of the current mm tree
> but I could live with KERN_INFO as well if people think that KERN_NOTICE
> is too hysterical.
> ---
> >From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001
> From: Dave Hansen <dave@xxxxxxxxxxxxxxxxxx>
> Date: Fri, 12 Oct 2012 14:30:54 +0200
> Subject: [PATCH] add some drop_caches documentation and info messsge
> 
> There is plenty of anecdotal evidence and a load of blog posts
> suggesting that using "drop_caches" periodically keeps your system
> running in "tip top shape".  Perhaps adding some kernel
> documentation will increase the amount of accurate data on its use.
> 
> If we are not shrinking caches effectively, then we have real bugs.
> Using drop_caches will simply mask the bugs and make them harder
> to find, but certainly does not fix them, nor is it an appropriate
> "workaround" to limit the size of the caches.
> 
> It's a great debugging tool, and is really handy for doing things
> like repeatable benchmark runs.  So, add a bit more documentation
> about it, and add a little KERN_NOTICE.  It should help developers
> who are chasing down reclaim-related bugs.
> 
> ...
>
> +		printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n",
> +			current->comm, task_pid_nr(current), sysctl_drop_caches);

urgh.  Are we really sure we want to do this?  The system operators who
are actually using this thing will hate us :(

More friendly alternatives might be:

- Taint the kernel.  But that will only become apparent with an oops
  trace or similar.

- Add a drop_caches counter and make that available in /proc/vmstat,
  show_mem() output and perhaps other places.

I suspect the /proc/vmstat counter will suffice - if someone is having
vm issues, we'll be seeing their /proc/vmstat at some stage and if the
drop_caches counter is high, that's enough to get suspicious?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>