On 19.03.2018 14:06, Kirill Tkhai wrote: > On 17.03.2018 00:39, Dave Chinner wrote: >> On Fri, Mar 16, 2018 at 11:55:30AM +0300, Kirill Tkhai wrote: >>> On 16.03.2018 02:03, Dave Chinner wrote: >>>> On Thu, Mar 15, 2018 at 10:28:43PM +0300, Kirill Tkhai wrote: >>>>> On 15.03.2018 20:49, Michal Hocko wrote: >>>>>> On Thu 15-03-18 18:01:34, Kirill Tkhai wrote: >>>>>>> xfs_reclaim_inodes_count(XFS_M(sb)) does not care about memcg. >>>>>>> So, it's called for memcg reclaim too, e.g. this list is shrinked >>>>>>> disproportionality to another lists. >>>>>>> >>>>>>> This looks confusing, so I'm reporting about this. >>>>>>> Consider this patch as RFC. >>>>>> >>>>>> Could you be more specific about the problem you are trying to solve? >>>>>> Because we do skip shrinkers which are not memcg aware by >>>>>> shrink_slab: >>>>>> /* >>>>>> * If kernel memory accounting is disabled, we ignore >>>>>> * SHRINKER_MEMCG_AWARE flag and call all shrinkers >>>>>> * passing NULL for memcg. >>>>>> */ >>>>>> if (memcg_kmem_enabled() && >>>>>> !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE)) >>>>>> continue; >>>>>> >>>>>> Or am I missing something? >>>>> >>>>> sb->s_op->nr_cached_objects is a sub-method of generic super_cache_count(). >>>>> super_cache_count() is owned and only called by superblock's shrinker, >>>>> which does have SHRINKER_MEMCG_AWARE flag. >>>> >>>> Xfs inodes are accounted to memcgs when they are allocated. All the >>>> memcg reclaim stuff is done at the VFS inode cache level - all the >>>> XFS inode cache shrinker does is clean up inodes that are not >>>> referenced by the VFS inode cache because the memcg aware reclaim >>>> has already freed them. >>>> >>>> i.e. what the XFS inode cache is doing is perfectly reasonable - >>>> memcg aware inode reclaim is occuring at the VFS level, but on XFS >>>> that does not free the inodes as they are still referenced >>>> internally by XFS. However, once the inode is removed from the VFS >>>> LRU, all memcg information about the inode is destroyed, so there's >>>> nothing in the XFS layers that cares about memcgs. >>> >>> So, after inode is removed from LRU, memory still remains accounted >>> to memcg till the time they are actually freed. I personally don't >>> care, just to mention. >>> >>>> Hence when the XFS inode shrinker then called to run a >>>> garbage collection pass on unreferenced inodes - the inodes that >>>> are now unreferenced in the memcg due to the VFS inode shrinker pass >>>> - it frees inodes regardless of the memcg context it was called from >>>> because that information is no longer present in the inode cache. >>>> Hence we just ignore memcgs at this level. >>> >>> But xfs_fs_free_cached_objects() returns number of these freed object >>> as result to super_cache_scan(), so shrinker interprets them as related >>> to a memcg, while they may be related to another memcg. This introduces >>> a disproportion relatively to another shrinkers called to memcg. >> >> In what way? All memcgs see tha same values from the backing cache >> and so try to do the same amount of scanning work. The shrinker >> accounting simply doesn't care where the objects are scanned from, >> as long as it comes from the same place as the calculation of the >> number of objects in the cache it's about to scan. > > shrinker->count_objects() result is used to count number of objects, > do_shrink_slab() should shrink during the call: > > freeable = shrinker->count_objects(shrinker, shrinkctl); > > Then shrinker takes part of this value: > > delta = freeable >> priority; > delta *= 4; > do_div(delta, shrinker->seeks); > > This is a number, the shrinker tries to shrink during the call. > Let the priority is DEF_PRIORITY. Then, shrinker try to shrink > freeable*4/(shrinker->seeks * 2^DEF_PRIORITY) enteties from every > shrinker. Equal percent of every memcg shrinker. > > When XFS shows global number of cached objects in count_objects(), > shrinker also tryes to shrink the same percent of global objects, > as for other memcg shrinkers. So, when you have small memcg > with 128Mb memory allowed and small number of tasks related to it, > you may meet 1Gb of cached objects, which were queued by another > big cgroup. So, the small cgroup may shrink number of objects of > size more, then its own. It's not fair. That's all I'm worry in > this message. > >> The memcg accounting, OTOH, is completely divorced from the >> shrinker, so if it's still got too much slab memory accounted to it, >> it will simply run the shrinker again and do some more memory >> reclaim. > > This message is not about OOM, it's about imbalance. See above. > >> XFS does this for IO efficiency purposes, not because it's ideal >> behaviour for memcgs. If we start doing exact memcg-only inode >> reclaim, we're going back to the bad old days where inode reclaim >> causes really nasty small random write storms that essentially >> starve the storage subsystem of all other IO for seconds at a time. >> That is a *far worse problem* from a system performance POV than >> having the memcg memory reclaim run a couple more shrinker passes >> and do a little more work.... > > Ok, this is a point. > >>> Is there a problem? This is what my patch about. >> >> You've described some theoretical issue, but not described any user >> visible or performance impacting behaviour that users actually care >> about. What reproducable, observable behaviour does it fix/change? > > Strange question. Do you fix problems only when you meet a reproducable > BUG()? This is the kernel, and many problem may be racy. But this > does not mean, it's prohibited to discuss about them. > >>>> So, is there a problem you are actually trying to fix, or is this >>>> simply a "I don't understand how the superblock shrinkers work, >>>> please explain" patch? >>> >>> I work on some shrinker changes, and just want to understand they don't >>> touch anything. >> >> Oh, goody, another person about to learn the hard way that shrinkers >> are far more varied and complex than page reclaim :P > > It may be a surprise, but I have to say, that all memcg-aware shrinkers > are based on list_lrus. XFS is exception. So, your words are not about > shrinkers in general, just about XFS. Just to clarify. I personally do not care about this problem. Consider this as RFC/possible error report. If you are not interested in this, let's stop the discussion. Kirill -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html