On Thu 22-08-19 08:33:40, Yang Shi wrote: > > > On 8/22/19 1:04 AM, Michal Hocko wrote: > > On Thu 22-08-19 01:55:25, Yang Shi wrote: [...] > > > And, they seems very common with the common workloads when THP is > > > enabled. A simple run with MariaDB test of mmtest with THP enabled as > > > always shows it could generate over fifteen thousand deferred split THPs > > > (accumulated around 30G in one hour run, 75% of 40G memory for my VM). > > > It looks worth accounting in MemAvailable. > > OK, this makes sense. But your above numbers are really worrying. > > Accumulating such a large amount of pages that are likely not going to > > be used is really bad. They are essentially blocking any higher order > > allocations and also push the system towards more memory pressure. > > That is accumulated number, during the running of the test, some of them > were freed by shrinker already. IOW, it should not reach that much at any > given time. Then the above description is highly misleading. What is the actual number of lingering THPs that wait for the memory pressure in the peak? > > IIUC deferred splitting is mostly a workaround for nasty locking issues > > during splitting, right? This is not really an optimization to cache > > THPs for reuse or something like that. What is the reason this is not > > done from a worker context? At least THPs which would be freed > > completely sound like a good candidate for kworker tear down, no? > > Yes, deferred split THP was introduced to avoid locking issues according to > the document. Memcg awareness would help to trigger the shrinker more often. > > I think it could be done in a worker context, but when to trigger to worker > is a subtle problem. Why? What is the problem to trigger it after unmap of a batch worth of THPs? -- Michal Hocko SUSE Labs