On Wed, Apr 09, 2014 at 07:02:30PM +0400, Vladimir Davydov wrote: > After the memcg is offlined, we mark its kmem caches that cannot be > deleted right now due to pending objects as dead by setting the > memcg_cache_params::dead flag, so that memcg_release_pages will schedule > cache destruction (memcg_cache_params::destroy) as soon as the last slab > of the cache is freed (memcg_cache_params::nr_pages drops to zero). > > I guess the idea was to destroy the caches as soon as possible, i.e. > immediately after freeing the last object. However, it just doesn't work > that way, because kmem caches always preserve some pages for the sake of > performance, so that nr_pages never gets to zero unless the cache is > shrunk explicitly using kmem_cache_shrink. Of course, we could account > the total number of objects on the cache or check if all the slabs > allocated for the cache are empty on kmem_cache_free and schedule > destruction if so, but that would be too costly. > > Thus we have a piece of code that works only when we explicitly call > kmem_cache_shrink, but complicates the whole picture a lot. Moreover, > it's racy in fact. For instance, kmem_cache_shrink may free the last > slab and thus schedule cache destruction before it finishes checking > that the cache is empty, which can lead to use-after-free. > > So I propose to remove this async cache destruction from > memcg_release_pages, and check if the cache is empty explicitly after > calling kmem_cache_shrink instead. This will simplify things a lot w/o > introducing any functional changes. > > And regarding dead memcg caches (i.e. those that are left hanging around > after memcg offline for they have objects), I suppose we should reap > them either periodically or on vmpressure as Glauber suggested > initially. I'm going to implement this later. memcg_release_pages() can be called after cgroup destruction, and thus it *must* ensure that the now-empty cache is destroyed - or we'll leak it. There is no excuse to downgrade to periodic reaping when we already directly hook into the event that makes the cache empty. If slab needs to hold on to the cache for slightly longer than the final memcg_release_pages(), then it should grab a refcount to it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>