On Wed, Jul 03, 2013 at 09:24:03PM +1000, Dave Chinner wrote: > On Tue, Jul 02, 2013 at 02:44:27PM +0200, Michal Hocko wrote: > > On Tue 02-07-13 22:19:47, Dave Chinner wrote: > > [...] > > > Ok, so it's been leaked from a dispose list somehow. Thanks for the > > > info, Michal, it's time to go look at the code.... > > > > OK, just in case we will need it, I am keeping the machine in this state > > for now. So we still can play with crash and check all the juicy > > internals. > > My current suspect is the LRU_RETRY code. I don't think what it is > doing is at all valid - list_for_each_safe() is not safe if you drop > the lock that protects the list. i.e. there is nothing that protects > the stored next pointer from being removed from the list by someone > else. Hence what I think is occurring is this: > > > thread 1 thread 2 > lock(lru) > list_for_each_safe(lru) lock(lru) > isolate ...... > lock(i_lock) > has buffers > __iget > unlock(i_lock) > unlock(lru) > ..... (gets lru lock) > list_for_each_safe(lru) > walks all the inodes > finds inode being isolated by other thread > isolate > i_count > 0 > list_del_init(i_lru) > return LRU_REMOVED; > moves to next inode, inode that > other thread has stored as next > isolate > i_state |= I_FREEING > list_move(dispose_list) > return LRU_REMOVED > .... > unlock(lru) > lock(lru) > return LRU_RETRY; > if (!first_pass) > .... > --nr_to_scan > (loop again using next, which has already been removed from the > LRU by the other thread!) > isolate > lock(i_lock) > if (i_state & ~I_REFERENCED) > list_del_init(i_lru) <<<<< inode is on dispose list! > <<<<< inode is now isolated, with I_FREEING set > return LRU_REMOVED; > > That fits the corpse left on your machine, Michal. One thread has > moved the inode to a dispose list, the other thread thinks it is > still on the LRU and should be removed, and removes it. > > This also explains the lru item count going negative - the same item > is being removed from the lru twice. So it seems like all the > problems you've been seeing are caused by this one problem.... > > Patch below that should fix this. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > > list_lru: fix broken LRU_RETRY behaviour > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > The LRU_RETRY code assumes that the list traversal status after we > have dropped and regained the list lock. Unfortunately, this is not > a valid assumption, and that can lead to racing traversals isolating > objects that the other traversal expects to be the next item on the > list. > > This is causing problems with the inode cache shrinker isolation, > with races resulting in an inode on a dispose list being "isolated" > because a racing traversal still thinks it is on the LRU. The inode > is then never reclaimed and that causes hangs if a subsequent lookup > on that inode occurs. > > Fix it by always restarting the list walk on a LRU_RETRY return from > the isolate callback. Avoid the possibility of livelocks the current > code was trying to aavoid by always decrementing the nr_to_walk > counter on retries so that even if we keep hitting the same item on > the list we'll eventually stop trying to walk and exit out of the > situation causing the problem. > > Reported-by: Michal Hocko <mhocko@xxxxxxx> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > mm/list_lru.c | 29 ++++++++++++----------------- > 1 file changed, 12 insertions(+), 17 deletions(-) > > diff --git a/mm/list_lru.c b/mm/list_lru.c > index dc71659..7246791 100644 > --- a/mm/list_lru.c > +++ b/mm/list_lru.c > @@ -71,19 +71,19 @@ list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate, > struct list_lru_node *nlru = &lru->node[nid]; > struct list_head *item, *n; > unsigned long isolated = 0; > - /* > - * If we don't keep state of at which pass we are, we can loop at > - * LRU_RETRY, since we have no guarantees that the caller will be able > - * to do something other than retry on the next pass. We handle this by > - * allowing at most one retry per object. This should not be altered > - * by any condition other than LRU_RETRY. > - */ > - bool first_pass = true; > > spin_lock(&nlru->lock); > restart: > list_for_each_safe(item, n, &nlru->list) { > enum lru_status ret; > + > + /* > + * decrement nr_to_walk first so that we don't livelock if we > + * get stuck on large numbesr of LRU_RETRY items > + */ > + if (--(*nr_to_walk) == 0) > + break; > + > ret = isolate(item, &nlru->lock, cb_arg); > switch (ret) { > case LRU_REMOVED: > @@ -98,19 +98,14 @@ restart: > case LRU_SKIP: > break; > case LRU_RETRY: > - if (!first_pass) { > - first_pass = true; > - break; > - } > - first_pass = false; > + /* > + * The lru lock has been dropped, our list traversal is > + * now invalid and so we have to restart from scratch. > + */ > goto restart; > default: > BUG(); > } > - > - if ((*nr_to_walk)-- == 0) > - break; > - > } This patch makes perfect sense to me, along with your description. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>