Re: Caching/buffers become useless after some time

Marinko Catovic <marinko.catovic@xxxxxxxxx> · Thu, 1 Nov 2018 23:46:27 +0100

Am Do., 1. Nov. 2018 um 14:23 Uhr schrieb Michal Hocko <mhocko@xxxxxxxx>:
>
> On Wed 31-10-18 20:21:42, Marinko Catovic wrote:
> > Am Mi., 31. Okt. 2018 um 18:01 Uhr schrieb Michal Hocko <mhocko@xxxxxxxx>:
> > >
> > > On Wed 31-10-18 15:53:44, Marinko Catovic wrote:
> > > [...]
> > > > Well caching of any operations with find/du is not necessary imho
> > > > anyway, since walking over all these millions of files in that time
> > > > period is really not worth caching at all - if there is a way you
> > > > mentioned to limit the commands there, that would be great.
> > >
> > > One possible way would be to run this find/du workload inside a memory
> > > cgroup with high limit set to something reasonable (that will likely
> > > require some tuning). I am not 100% sure that will behave for metadata
> > > mostly workload without almost any pagecache to reclaim so it might turn
> > > out this will result in other issues. But it is definitely worth trying.
> >
> > hm, how would that be possible..? every user has its UID, the group
> > can also not be a factor, since this memory restriction would apply to
> > all users then, find/du are running as UID 0 to have access to
> > everyone's data.
>
> I thought you have a dedicated script(s) to do all the stats. All you
> need is to run that particular script(s) within a memory cgroup

yes, that is the case - the scripts are running as root, since as
mentioned all users have own UIDs and specific groups, so to have
access one would need root privileges.
My question was how to limit this using cgroups, since afaik limits
there apply to given UIDs/GIDs

> > so what is the conclusion from this issue now btw? is it something
> > that will be changed/fixed at any time?
>
> It is likely that you are triggering a pathological memory fragmentation
> with a lot of unmovable objects that prevent it to get resolved. That
> leads to memory over reclaim to make a forward progress. A hard nut to
> resolve but something that is definitely on radar to be solved
> eventually. So far we have been quite lucky to not trigger it that
> badly.

good to hear :)

> > As I understand everyone would have this issue when extensive walking
> > over files is performed, basically any `cloud`, shared hosting or
> > storage systems should experience it, true?
>
> Not really. You need also a high demand for high order allocations to
> require contiguous physical memory. Maybe there is something in your
> workload triggering this particular pattern.

I would not even know what triggers it, nor what it has to do with
high order, I'm just running find/du, nothing special I'd say.