On Sat 21-10-17 08:51:04, Balbir Singh wrote: > On Fri, Oct 20, 2017 at 9:25 AM, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > lru_add_drain_all() is not required by mlock() and it will drain > > everything that has been cached at the time mlock is called. And > > that is not really related to the memory which will be faulted in > > (and cached) and mlocked by the syscall itself. > > > > Without lru_add_drain_all() the mlocked pages can remain on pagevecs > > and be moved to evictable LRUs. However they will eventually be moved > > back to unevictable LRU by reclaim. So, we can safely remove > > lru_add_drain_all() from mlock syscall. Also there is no need for > > local lru_add_drain() as it will be called deep inside __mm_populate() > > (in follow_page_pte()). > > > > On larger machines the overhead of lru_add_drain_all() in mlock() can > > be significant when mlocking data already in memory. We have observed > > high latency in mlock() due to lru_add_drain_all() when the users > > were mlocking in memory tmpfs files. > > > > Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > > --- > > I'm afraid I still don't fully understand the impact in terms of numbers and > statistics as seen from inside a cgroup. I really fail to see why there would be anything cgroup specific here. > My understanding is that we'll slowly > see the unreclaimable stats go up as we drain the pvec's across CPU's Not really. Draining is a bit tricky. Anonymous PF (gup) use lru_cache_add_active_or_unevictable so we bypass the LRU cache on mlocked pages altogether. Filemap faults go via cache and __pagevec_lru_add_fn to flush a full cache is not mlock aware. But gup (follow_page_pte) path tries to move existing and mapped pages to the unevictable LRU list. So yes we can see lazy mlock pages on evictable LRU but reclaim will get them to the unevictable list when needed. This should be mostly reduced to file mappings. But I haven't checked the code recently and mlock is quite tricky so I might misremember. In any case lru_add_drain_all is quite tangent to all this AFAICS. > I understand the optimization and I can see why lru_add_drain_all() is > expensive. not only it is expensive it is paying price for previous caching which might not be directly related to the mlock syscall. > Acked-by: Balbir Singh <bsingharora@xxxxxxxxx> > > Balbir Singh. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>