On Mon, 5 Dec 2011 23:36:34 -0800 (PST) Hugh Dickins <hughd@xxxxxxxxxx> wrote: > On Tue, 6 Dec 2011, KAMEZAWA Hiroyuki wrote: > > On Mon, 5 Dec 2011 16:13:06 -0800 (PST) > > Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > > > > Ying and I found PageCgroupAcctLRU very hard to grasp, even despite > > > the comments Hannes added to explain it. > > > > Now, I don't think it's difficult. It seems no file system codes > > add pages to LRU before add_to_page_cache() (I checked.) > > So, what we need to care is only swap-cache. In swap-cache path, > > we can do slow work. > > I've been reluctant to add more special code for SwapCache: > it may or may not be a good idea. Hannes also noted a FUSE > case which requires the before-commit-after handling swap was > using (for memcg-zone lru locking we've merged them into commit). > I think we need a fix for FUSE. In past, FUSE/splice used add_to_page_cache() but not it uses replace_page_cache(). So, we need another care. (I posted a patch.) > > > > > In moving the LRU locking > > > from zone to memcg, we needed to depend upon pc->mem_cgroup: that > > > was difficult while the interpretation of pc->mem_cgroup depended > > > upon two flags also; and very tricky when pages were liable to shift > > > underneath you from one LRU to another, as flags came and went. > > > So we already eliminated PageCgroupAcctLRU here. > > > > > > > Okay, Hm, do you see performance improvement by moving locks ? > > I was expecting someone to ask that question! I'm not up-to-date > on it, it's one of the things I have to get help to gather before > sending in the patch series. > > I believe the answer is that we saw some improvement on some tests, > but not so much as to make a hugely compelling case for the change. > But by that time we'd invested a lot of testing in the memcg locking, > and little in the original zone locking, so went with the memcg > locking anyway. > > We'll get more results and hope to show a stronger case for it now. > But our results will probably have to be based on in-house kernels, > with a lot of the "infrastructure" mods already in place, to allow > an easy build-time switch between zone locking and memcg locking. > > That won't be such a fair test if the "infrastructure" mods are > themselves detrimental (I believe not). It would be better to > compare, say, 3.2.0-next against 3.2.0-next plus our patches - > but my own (quad) machines for testing upstream kernels won't > be big enough to show much of interest. I'm rather hoping > someone will be interested enough to try on something beefier. > Hmm, at first glance at the patch, it seems far complicated than I expected and added much checks and hooks to lru path... > > > > > > However, I've hardly begun splitting the changes up into a series: > > > had intended to do so last week, but day followed day... If you'd > > > like to see the unpolished uncommented rollup, I can post that. > > > > > > > please. > > Anyway, I'll post my own again as output even if I stop my work there. > > Okay, here it is: my usual mix of cleanup and functional changes. > There's work by Ying and others in here - will apportion authorship > more fairly when splitting. If you're looking through it at all, > the place to start would be memcontrol.c's lock_page_lru_irqsave(). > Thank you. This seems inetersting patch. Hmm...what I think of now is.. In most case, pages are newly allocated and charged ,and then, added to LRU. pc->mem_cgroup never changes while pages are on LRU. I have a fix for corner cases as to do 1. lock lru 2. remove-page-from-lru 3. overwrite pc->mem_cgroup 4. add page to lru again 5. unlock lru And blindly believe pc->mem_cgroup regardless of PCG_USED bit at LRU handling. Hm, per-zone-per-memcg lru locking is much easier if - we igonore PCG_USED bit at lru handling - we never overwrite pc->mem_cgroup if the page is on LRU. - if page may be added to LRU by pagevec etc.. while we overwrite pc->mem_cgroup, we always take lru_lock. This is our corner case. isn't it ? I posted a series of patch. I'm glad if you give me a quick review. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>