Re: Caching/buffers become useless after some time

Marinko Catovic <marinko.catovic@xxxxxxxxx> · Fri, 30 Nov 2018 13:01:49 +0100

Am Fr., 2. Nov. 2018 um 15:59 Uhr schrieb Vlastimil Babka <vbabka@xxxxxxx>:
>
> Forgot to answer this:
>
> On 10/31/18 3:53 PM, Marinko Catovic wrote:
> > Well caching of any operations with find/du is not necessary imho
> > anyway, since walking over all these millions of files in that time
> > period is really not worth caching at all - if there is a way you
> > mentioned to limit the commands there, that would be great.
> > Also I want to mention that these operations were in use with 3.x
> > kernels as well, for years, with absolutely zero issues.
>
> Yep, something had to change at some point. Possibly the
> reclaim/compaction loop. Probably not the way dentries/inodes are being
> cached though.
>
> > 2 > drop_caches right after that is something I considered, I just had
> > some bad experience with this, since I tried it around 5:00 AM in the
> > first place to give it enough spare time to finish, since sync; echo 2
> >> drop_caches can take some time, hence my question about lowering the
> > limits in mm/vmscan.c, void drop_slab_node(int nid)
> >
> > I could do this effectively right after find/du at 07:45, just hoping
> > that this is finished soon enough - in one worst case it took over 2
> > hours (from 05:00 AM to 07:00 AM), since the host was busy during that
> > time with find/du, never having freed enough caches to continue, hence
>
> Dropping caches while find/du is still running would be
> counter-productive. If done after it's already finished, it shouldn't be
> so disruptive.
>
> > my question to let it stop earlier with the modification of
> > drop_slab_node ... it was just an idea, nevermind if you believe that
> > it was a bad one :)
>
> Finding a universally "correct" threshold could easily be impossible. I
> guess the proper solution would be to drop the while loop and
> restructure the shrinking so that it would do a single pass through all
> objects.

well after a few weeks to make sure, the results seem very promising.
There were no issues any more after setting up the cgroup with the limit.

This workaround is anyway a good idea to prevent the nightly processed
from eating up all the caching/buffers which become useless anyway in
the morning, so performance got even better - although the issue is
not fixed with that workaround.
Since other people will be affected sooner or later as well imho,
hopefully you'll figure out a fix soon.

Nevertheless I also ran into a new problem there.
While writing the PID into the tasks-file (echo $$ > ../tasks) or a
direct fputs(getpid(), tasks_fp);
works very well, I also had problems with daemons that I wanted to
start (e.g. a SQL server) from within that cgroup-controlled binary.
This results in the sql server's task kill, since the memory limit is
exceeded. I would not like to set the memory.limit_in_bytes to
something that huge, such as 30G to make sure, I'd rather just use a
wrapper script to handle this, for example:
1) the cgroup-controlled instance starts the wrapper script
2) which excludes itself from the tasks-PID-list (hence the wrapper
script it is not controlled any more)
3) it starts or does whatever necessary that should continue normally
without the memory restriction

Currently I fail to manage this, since I do not know how to do step 2.
echo $PID > tasks writes into it and adds the PID, but how would one
remove the wrapper script's PID from there?
I came up with: cat /cgpath/A/tasks | sed "/$$/d" | cat >
/cgpath/A/tasks ..which results in a list without the current PID,
however, it fails to write to tasks with cat: write error: Invalid
argument, since this is not a regular file.