Re: [PATCH 2/2] cgroup: Use separate work structs on css release path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/26/22 11:15, Tejun Heo wrote:
Hello, Michal.

On Thu, May 26, 2022 at 11:56:34AM +0200, Michal Koutný wrote:
// ref=A: initial state
kill_css()
   css_get // ref+=F == A+F: fuse
   percpu_ref_kill_and_confirm
     __percpu_ref_switch_to_atomic
       percpu_ref_get
         // ref += 1 == A+F+1: atomic mode, self-protection
     percpu_ref_put
       // ref -= 1 == A+F: kill the base reference
   [via rcu]
   percpu_ref_switch_to_atomic_rcu
     percpu_ref_call_confirm_rcu
       css_killed_ref_fn == refcnt.confirm_switch
         queue_work(css->destroy_work)        (1)
                                                      [via css->destroy_work]
                                                      css_killed_work_fn == wq.func
                                                        offline_css() // needs fuse
                                                        css_put // ref -= F == A: de-fuse
       percpu_ref_put
         // ref -= 1 == A-1: remove self-protection
         css_release                                   // A <= 1 -> 2nd queue_work explodes!

I'm not sure I'm following it but it's perfectly fine to re-use the work
item at this point. The work item actually can be re-cycled from the very
beginning of the work function. The only thing we need to make sure is that
we don't css_put() prematurely to avoid it being freed while we're using it.

For the sharing to be a problem, we should be queueing the release work item
while the destroy instance is still pending, and if that is the case, it
doesn't really matter whether we use two separate work items or not. We're
already broken and would just be shifting the problem to explode elsewhere.

The only possibility that I can think of is that somehow we're ending up
with an extra css_put() somewhere thus triggering the release path
prematurely. If that's the case, we'll prolly need to trace get/puts to find
out who's causing the ref imbalance.

Hi Michal,
As far as I can see we are trying to test the same thing suggested by Tejun.
I just sent a test request to try this:
https://github.com/tstruk/linux/commit/master

Let me know if you have any more tests to run and I will hold off until
you are done.

--
Thanks,
Tadeusz



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux