On Mon, 23 Feb 2015, Kirill A. Shutemov wrote: > On Fri, Feb 20, 2015 at 07:51:16PM -0800, Hugh Dickins wrote: > > Though debug kernels have a VM_BUG_ON to help protect from misaccounting > > lru_size, non-debug kernels are liable to wrap it around: and then the > > vast unsigned long size draws page reclaim into a loop of repeatedly > > doing nothing on an empty list, without even a cond_resched(). > > > > That soft lockup looks confusingly like an over-busy reclaim scenario, > > with lots of contention on the lruvec lock in shrink_inactive_list(): > > yet has a totally different origin. > > > > Help differentiate with a custom warning in mem_cgroup_update_lru_size(), > > even in non-debug kernels; and reset the size to avoid the lockup. But > > the particular bug which suggested this change was mine alone, and since > > fixed. > > Do we need this kind of check for !MEMCG kernels? I hope we don't: I hope that the MEMCG case can be a good enough canary to catch the issues for !MEMCG too. I thought that the !MEMCG stats were maintained in such a different (per-cpu) way, whose batching would defeat such checks without imposing unwelcome overhead - or am I wrong on that? > > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > > --- > > include/linux/mm_inline.h | 2 +- > > mm/memcontrol.c | 24 ++++++++++++++++++++---- > > 2 files changed, 21 insertions(+), 5 deletions(-) > > > > --- thpfs.orig/include/linux/mm_inline.h 2013-11-03 15:41:51.000000000 -0800 > > +++ thpfs/include/linux/mm_inline.h 2015-02-20 19:33:25.928096883 -0800 > > @@ -35,8 +35,8 @@ static __always_inline void del_page_fro > > struct lruvec *lruvec, enum lru_list lru) > > { > > int nr_pages = hpage_nr_pages(page); > > - mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); > > list_del(&page->lru); > > + mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); > > __mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -nr_pages); > > } > > > > --- thpfs.orig/mm/memcontrol.c 2015-02-08 18:54:22.000000000 -0800 > > +++ thpfs/mm/memcontrol.c 2015-02-20 19:33:25.928096883 -0800 > > @@ -1296,22 +1296,38 @@ out: > > * @lru: index of lru list the page is sitting on > > * @nr_pages: positive when adding or negative when removing > > * > > - * This function must be called when a page is added to or removed from an > > - * lru list. > > + * This function must be called under lruvec lock, just before a page is added > > + * to or just after a page is removed from an lru list (that ordering being so > > + * as to allow it to check that lru_size 0 is consistent with list_empty). > > */ > > void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, > > int nr_pages) > > { > > struct mem_cgroup_per_zone *mz; > > unsigned long *lru_size; > > + long size; > > + bool empty; > > > > if (mem_cgroup_disabled()) > > return; > > > > mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec); > > lru_size = mz->lru_size + lru; > > - *lru_size += nr_pages; > > - VM_BUG_ON((long)(*lru_size) < 0); > > + empty = list_empty(lruvec->lists + lru); > > + > > + if (nr_pages < 0) > > + *lru_size += nr_pages; > > + > > + size = *lru_size; > > + if (WARN(size < 0 || empty != !size, > > + "mem_cgroup_update_lru_size(%p, %d, %d): lru_size %ld but %sempty\n", > > + lruvec, lru, nr_pages, size, empty ? "" : "not ")) { > > Formatting can be unscrewed this way: > > if (WARN(size < 0 || empty != !size, > "%s(%p, %d, %d): lru_size %ld but %sempty\n", > __func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) { Indeed, thanks. Greg Thelen had made the same suggestion for a different reason, I just didn't get to incorporate it this time around, but will do better next time. I don't expect to be reposting the whole series very soon, unless someone asks for it: I think migration and recovery ought to be supported before reposting. But I might send a mini-series of this and the other preparatory patches, maybe. > > > + VM_BUG_ON(1); > > + *lru_size = 0; > > + } > > + > > + if (nr_pages > 0) > > + *lru_size += nr_pages; > > } > > > > bool mem_cgroup_is_descendant(struct mem_cgroup *memcg, struct mem_cgroup *root) > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > -- > Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>