On Sat, Oct 07, 2017 at 06:06:51PM +0100, Al Viro wrote: > On Sat, Oct 07, 2017 at 02:56:40PM +0300, Vladimir Davydov wrote: > > Hello, > > > > On Fri, Oct 06, 2017 at 11:06:04AM +0200, Michal Hocko wrote: > > > On Fri 06-10-17 16:59:18, Jia-Ju Bai wrote: > > > > According to fs/super.c, the kernel may sleep under a spinlock. > > > > The function call path is: > > > > put_super (acquire the spinlock) > > > > __put_super > > > > destroy_super > > > > list_lru_destroy > > > > list_lru_unregister > > > > mutex_lock --> may sleep > > > > memcg_get_cache_ids > > > > down_read --> may sleep > > > > > > > > This bug is found by my static analysis tool and my code review. > > > > This is false-positive: by the time we get to destroy_super(), the lru > > lists have already been destroyed - see deactivate_locked_super() - so > > list_lru_destroy() will retrun right away without attempting to take any > > locks. That's why there's no lockdep warnings regarding this issue. > > > > I think we can move list_lru_destroy() to destroy_super_work() to > > suppress this warning. Not sure if it's really worth the trouble though. > > It's a bit trickier than that (callers of destroy_super() prior to superblock > getting reachable via shared data structures do not have that lru_list_destroy() > a no-op, but they are not called under spinlocks). > > Locking in mm/list_lru.c looks excessive, but then I'm not well familiar with > that code. It *is* excessive. 1) coallocate struct list_lru and array of struct list_lru_node hanging off it. Turn all existing variables and struct members of that type into pointers. init would allocate and return a pointer, destroy would free (and leave it for callers to clear their pointers, of course). 2) store the value of memcg_nr_cache_ids as of the last creation or resize in list_lru. Pass that through to memcg_destroy_list_lru_node(). Result: no need for memcg_get_cache_ids() in list_lru_destroy(). 3) add static list_lru *currently_resized, protected by list_lru_mutex. NULL when memcg_update_all_list_lrus() is not run, points to currently resized list_lru when it is. 4) have lru_list_destroy() check (under list_lru_mutex) whether it's being asked to kill the currently resized one. If it is, do victim->list.prev->next = victim->list.next; victim->list.next->prev = victim->list.prev; victim->list.prev = NULL; and bugger off, otherwise act as now. Turn the loop in memcg_update_all_list_lrus() into mutex_lock(&list_lrus_mutex); lru = list_lrus.next; while (lru != &list_lrus) { currently_resized = list_entry(lru, struct list_lru, list); mutex_unlock(&list_lrus_mutex); ret = memcg_update_list_lru(lru, old_size, new_size); mutex_lock(&list_lrus_mutex); if (unlikely(!lru->prev)) { lru = lru->next; free currently_resized as list_lru_destroy() would have continue; } if (ret) goto fail; lru = lru->next; } currently_resized = NULL; mutex_unlock(&list_lrus_mutex); 5) replace list_lrus_mutex with a spinlock. At that point list_lru_destroy() becomes non-blocking. I think it should work, but that's from several hours of looking through that code, so I might be missing something subtle here...