Re: [PATCH v8 03/17] mm: Assign id to every memcg-aware shrinker

Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> · Thu, 12 Jul 2018 14:19:35 +0300



On 12.07.2018 14:13, Kirill Tkhai wrote:
> On 03.07.2018 20:32, Kirill Tkhai wrote:
>> On 03.07.2018 20:00, Shakeel Butt wrote:
>>> On Tue, Jul 3, 2018 at 9:17 AM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote:
>>>>
>>>> Hi, Shakeel,
>>>>
>>>> On 03.07.2018 18:46, Shakeel Butt wrote:
>>>>> On Tue, Jul 3, 2018 at 8:27 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> On Tue, Jul 03, 2018 at 06:09:05PM +0300, Kirill Tkhai wrote:
>>>>>>> +++ b/mm/vmscan.c
>>>>>>> @@ -169,6 +169,49 @@ unsigned long vm_total_pages;
>>>>>>>  static LIST_HEAD(shrinker_list);
>>>>>>>  static DECLARE_RWSEM(shrinker_rwsem);
>>>>>>>
>>>>>>> +#ifdef CONFIG_MEMCG_KMEM
>>>>>>> +static DEFINE_IDR(shrinker_idr);
>>>>>>> +static int shrinker_nr_max;
>>>>>>
>>>>>> So ... we've now got a list_head (shrinker_list) which contains all of
>>>>>> the shrinkers, plus a shrinker_idr which contains the memcg-aware shrinkers?
>>>>>>
>>>>>> Why not replace the shrinker_list with the shrinker_idr?  It's only used
>>>>>> twice in vmscan.c:
>>>>>>
>>>>>> void register_shrinker_prepared(struct shrinker *shrinker)
>>>>>> {
>>>>>>         down_write(&shrinker_rwsem);
>>>>>>         list_add_tail(&shrinker->list, &shrinker_list);
>>>>>>         up_write(&shrinker_rwsem);
>>>>>> }
>>>>>>
>>>>>>         list_for_each_entry(shrinker, &shrinker_list, list) {
>>>>>> ...
>>>>>>
>>>>>> The first is simply idr_alloc() and the second is
>>>>>>
>>>>>>         idr_for_each_entry(&shrinker_idr, shrinker, id) {
>>>>>>
>>>>>> I understand there's a difference between allocating the shrinker's ID and
>>>>>> adding it to the list.  You can do this by calling idr_alloc with NULL
>>>>>> as the pointer, and then using idr_replace() when you want to add the
>>>>>> shrinker to the list.  idr_for_each_entry() skips over NULL entries.
>>>>>>
>>>>>> This will actually reduce the size of each shrinker and be more
>>>>>> cache-efficient when calling the shrinkers.  I think we can also get
>>>>>> rid of the shrinker_rwsem eventually, but let's leave it for now.
>>>>>
>>>>> Can you explain how you envision shrinker_rwsem can be removed? I am
>>>>> very much interested in doing that.
>>>>
>>>> Have you tried to do some games with SRCU? It looks like we just need to
>>>> teach count_objects() and scan_objects() to work with semi-destructed
>>>> shrinkers. Though, this looks this will make impossible to introduce
>>>> shrinkers, which do synchronize_srcu() in scan_objects() for example.
>>>> Not sure, someone will actually use this, and this is possible to consider
>>>> as limitation.
>>>>
>>>
>>> Hi Kirill, I tried SRCU and the discussion is at
>>> https://lore.kernel.org/lkml/20171117173521.GA21692@xxxxxxxxxxxxx/T/#u
>>>
>>> Paul E. McKenney suggested to enable SRCU unconditionally. So, to use
>>> SRCU for shrinkers, we first have to push unconditional SRCU.
>>
>> First time, I read this, I though the talk goes about some new srcu_read_lock()
>> without an argument and it's need to rework SRCU in some huge way. Thanks
>> god, it was just a misreading :)
>>> Tetsuo had another lockless solution which was a bit involved but does
>>> not depend on SRCU.
>>
>> Ok, I see refcounters suggestion. Thanks for the link, Shakeel!
> 
> Just returning to this theme. Since both of the suggested ways contain
> srcu synchronization, it may be better just to use percpu-rwsem, since
> there is the same functionality out-of-box.
> 
> register/unregister_shrinker() will use two rw semaphores:
> 
> register_shrinker()
> {
> 	down_write(&shrinker_rwsem);
> 	idr_alloc();
> 	up_write(&shrinker_rwsem);
> }
> 
> unregister_shrinker()
> {
> 	percpu_down_write(&percpu_shrinker_rwsem);
> 	down_write(&shrinker_rwsem);
> 	idr_remove();
> 	up_write(&shrinker_rwsem);
> 	percpu_up_write(&percpu_shrinker_rwsem);
> }
> 
> shrink_slab()
> {
> 	percpu_down_read(&percpu_shrinker_rwsem);
> 	rcu_read_lock();
> 	shrinker = idr_find();
> 	rcu_read_unlock();
> 
> 	do_shrink_slab(shrinker);
> 	percpu_up_read(&percpu_shrinker_rwsem);
> }
> 
> 1)Here is a trick to make register_shrinker() not use percpu semaphore,
>   i.e., not to wait RCU synchronization. This just makes register_shrinker()
>   faster. So, we introduce 2 semaphores instead of 1:
>   shrinker_rwsem to protect IDR and percpu_shrinker_rwsem.
> 
> 2)rcu_read_lock() -- to synchronize idr_find() with idr_alloc().
>   Not sure, we really need this. It's possible, lockless idr_find()
>   is OK in parallel with allocation of new ID. Parallel removing
>   is not possible because of percpu rwsem.
> 
> 3)Places, which are performance critical to unregister_shrinker() speed
>   (e.g., like deactivate_locked_super(), as we want umount() to be fast),
>   may just call it delayed from work:
> 
> diff --git a/fs/super.c b/fs/super.c
> index 13647d4fd262..b4a98cb00166 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -324,19 +324,7 @@ void deactivate_locked_super(struct super_block *s)
>  	struct file_system_type *fs = s->s_type;
>  	if (atomic_dec_and_test(&s->s_active)) {
>  		cleancache_invalidate_fs(s);
> -		unregister_shrinker(&s->s_shrink);
> -		fs->kill_sb(s);
> -
> -		/*
> -		 * Since list_lru_destroy() may sleep, we cannot call it from
> -		 * put_super(), where we hold the sb_lock. Therefore we destroy
> -		 * the lru lists right now.
> -		 */
> -		list_lru_destroy(&s->s_dentry_lru);
> -		list_lru_destroy(&s->s_inode_lru);
> -
> -		put_filesystem(fs);
> -		put_super(s);
> +		schedule_delayed_deactivate_super(s)
>  	} else {
>  		up_write(&s->s_umount);
>  	}

s/shrinker_rwsem/shrinker_mutex/