Re: [PATCH v4 5/7] mm: rework non-root kmem_cache lifecycle management

Waiman Long <longman@xxxxxxxxxx> · Tue, 21 May 2019 14:39:50 -0400





  
  
    On 5/14/19 8:06 PM, Shakeel Butt wrote:

    
    
      
        @@ -2651,20 +2652,35 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
        struct mem_cgroup *memcg;
        struct kmem_cache *memcg_cachep;
        int kmemcg_id;
+       struct memcg_cache_array *arr;

        VM_BUG_ON(!is_root_cache(cachep));

        if (memcg_kmem_bypass())
                return cachep;

-       memcg = get_mem_cgroup_from_current();
+       rcu_read_lock();
+
+       if (unlikely(current->active_memcg))
+               memcg = current->active_memcg;
+       else
+               memcg = mem_cgroup_from_task(current);
+
+       if (!memcg || memcg == root_mem_cgroup)
+               goto out_unlock;
+
        kmemcg_id = READ_ONCE(memcg->kmemcg_id);
        if (kmemcg_id < 0)
-               goto out;
+               goto out_unlock;

-       memcg_cachep = cache_from_memcg_idx(cachep, kmemcg_id);
-       if (likely(memcg_cachep))
-               return memcg_cachep;
+       arr = rcu_dereference(cachep->memcg_params.memcg_caches);
+
+       /*
+        * Make sure we will access the up-to-date value. The code updating
+        * memcg_caches issues a write barrier to match this (see
+        * memcg_create_kmem_cache()).
+        */
+       memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]);

        /*
         * If we are in a safe context (can wait, and not in interrupt
@@ -2677,10 +2693,20 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
         * memcg_create_kmem_cache, this means no further allocation
         * could happen with the slab_mutex held. So it's better to
         * defer everything.
+        *
+        * If the memcg is dying or memcg_cache is about to be released,
+        * don't bother creating new kmem_caches. Because memcg_cachep
+        * is ZEROed as the fist step of kmem offlining, we don't need
+        * percpu_ref_tryget() here. css_tryget_online() check in

      
      *percpu_ref_tryget_live()


      
        +        * memcg_schedule_kmem_cache_create() will prevent us from
+        * creation of a new kmem_cache.
         */
-       memcg_schedule_kmem_cache_create(memcg, cachep);
-out:
-       css_put(&memcg->css);
+       if (unlikely(!memcg_cachep))
+               memcg_schedule_kmem_cache_create(memcg, cachep);
+       else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt))
+               cachep = memcg_cachep;
+out_unlock:
+       rcu_read_lock();
      
    
    There is one more bug that causes the kernel to panic on bootup
      when I turned on debugging options.
    [   49.871437] =============================

      [   49.875452] WARNING: suspicious RCU usage

      [   49.879476] 5.2.0-rc1.bz1699202_memcg_test+ #2 Not tainted

      [   49.884967] -----------------------------

      [   49.888991] include/linux/rcupdate.h:268 Illegal context switch
      in RCU read-side critical section!

      [   49.897950]

      [   49.897950] other info that might help us debug this:

      [   49.897950]

      [   49.905958]

      [   49.905958] rcu_scheduler_active = 2, debug_locks = 1

      [   49.912492] 3 locks held by systemd/1:

      [   49.916252]  #0: 00000000633673c5
      (&type->i_mutex_dir_key#5){.+.+}, at: lookup_slow+0x42/0x70

      [   49.924788]  #1: 0000000029fa8c75 (rcu_read_lock){....}, at:
      memcg_kmem_get_cache+0x12b/0x910

      [   49.933316]  #2: 0000000029fa8c75 (rcu_read_lock){....}, at:
      memcg_kmem_get_cache+0x3da/0x910

    
    It should be "rcu_read_unlock();" at the end.

    
    -Longman