Re: Regarding doing away with refkeeper in locks xlator

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Fri, 06 Jun 2014 08:22:28 +0530



    On 06/06/2014 12:11 AM, Anand Avati
      wrote:

    
        On Thu, Jun 5, 2014 at 10:46 AM,
          Krutika Dhananjay <kdhananj@xxxxxxxxxx> wrote:

          
                    To
                        summarize, the real "problems" are:

                      
                      - Deref of pl_inode->refkeeper as
                        an inode_t in the cleanup logger. It 

                      is an internal artifact of
                        pl_update_refkeeper() working and nobody else 

                      is supposed to "touch" it.
                  
                  For this, the solution I have in
                    mind is to
                  
                    a. have a placeholder for gfid in pl_inode_t object,
                  b. remember the gfid of the inode
                    at the time of pl_inode_t creation in pl_ctx_get(),
                    and
                  c. print pl_inode->gfid in the
                    log message in pl_{inodelk,entrylk}_log_cleanup().

                  
              OK.
            

                      - Missing call
                          to pl_update_refkeeper() in
                          client_disconnect_cbk(). This 

                      
                      can result in a potential leak of inode ref (not a
                      leak if the same 

                      inode gets any locking activity by another client
                      in the future).

                    
                  As far as updates to refkeeper is
                    concerned during DISCONNECT is concerned, Pranith
                    and I did have discussions at length but could

                    not find a single safe place in cleanup functions to
                    call pl_update_refkeeper() that does not lead to
                    illegal memory access,
                  whether before or after the call
                    to __pl_inodelk_unref() in the third for loop.
                  

                  Case #1 explained:

                  
                    =============

                  
                  <snip>

                  
                    list_for_each_entry_safe() {

                  
                  ...

                  
                  ...

                  
                  pl_update_refkeeper(inode);

                  
                  pthread_mutex_lock
                    (&pl_inode->mutex);

                  
                       __pl_inodelk_unref(l);

                  
                  pthread_mutex_unlock
                    (&pl_inode->mutex);

                  
                  </snip>

                  
                    This causes the last unref in pl_update_refkeeper()
                    to implicitly call pl_forget() where pl_inode_t
                    object is destroyed while it is

                  
                    still needed in terms of having to obtain
                    pl_inode->mutex before unrefing the lock object.

                  
                    Case 2 explained:

                  
                  =============

                  
                    <snip>

                    
                    list_for_each_entry_safe() {

                    
                    ...

                    
                    ...

                    
                    pthread_mutex_lock (&pl_inode->mutex);

                    
                         __pl_inodelk_unref(l);

                    
                    pthread_mutex_unlock (&pl_inode->mutex);

                    
                    pl_update_refkeeper(inode);
                    </snip>
                  

                    After the first for loop is processed, the domain's
                    lists could have been emptied. And at this point,
                    there could well come a second thread that could
                    update refkeeper,

                  
                    triggering last unref on the inode leading to a call
                    to pl_forget() (where pl_inode_t is freed), after
                    which the DISCONNECT thread would be trying to
                    perform illegal

                  
                    memory access in its call to pl_update_refkeeper()
                    during its turn.

                  
                    So the solution Pranith and I came up with involves
                    making sure that the inode_t object is alive for as
                    long as there is atleast one lock on it:

                  
                  1. refkeeper will not be used for
                    inodelks and entrylks (but will continue to be used
                    in fcntl locks).

                  
                    2. Each pl_inode_t object will also host an inode_t
                    member which is initialised during a call to
                    pl_inode_get() in pl_common_{inodelk, entrylks}().

                  
                    3. Everytime a lock is granted/blocked,
                    pl_inode->inode is ref'd (in pl_inode_setlk()
                    after successful locking.

                  
                  4. Everytime a lock is unlocked,
                    pl_inode->inode is unref'd.

                  
                  5. In disconnect codepath, as
                    part of "released" list processing, the inodes
                    associated with the client are unref'd in the same
                    loop right after every lock object is unref'd.
                  

                  With all this, there is one
                    problem with the lock object that is found to be in
                    both blocked and granted list in the transient
                    state, when there's a race between the unlocking
                    thread

                  
                  and the disconnecting thread.
                    This can be best explained with the following
                    example:

                  
                    <example>

                  
                  Consider an inode I on which
                    there's a granted lock G and a blocked lock B, from
                    clients C1 and C2 respectively. With this approach,
                    at this point the number of refs taken

                  
                  by the locks xlator on I would be
                    2.

                  
                  C1 unlocks B, at which point I's
                    refcount becomes 1.

                  
                  Now as part of granting other
                    blocked locks in unlock codepath, B is put in
                    granted list.

                  
                    Now B's client C2 sends a DISCONNECT event, which
                    would cause I to be unref'd again. This being the
                    last unref, pl_forget() is called on I causing its
                    pl_inode to be destroyed
                  
                    At this point, the unlocking thread could try to do
                    a mutex lock on pl_inode->mutex in
                    grant_blocked_{inode,entry}_locks() (whereas
                    pl_inode is now already destroyed) leading to a
                    crash.
                  
                    </example>

                  
                  The problem described in the
                    example can be fixed by making sure
                    grant_blocked_{inode,entry}_locks() refs the inode
                    first thing and then unrefs it just before
                    returning.
                  This would fix illegal memory
                    access. 
                
              
            This sounds a bit complicated. I think there
              is a much simpler solution:
            

            - First, make update_refkeeper() check for
              blocked locks (which I mentioned as "optional" previously)
            

            - Make grant_blocked_locks() double up and do
              the job of update_refkeeper() internally.
            

            Something which looks like this:
            

                diff --git a/xlators/features/locks/src/common.c
                  b/xlators/features/locks/src/common.c
                index f6c71c1..38df385 100644
                --- a/xlators/features/locks/src/common.c
                +++ b/xlators/features/locks/src/common.c
                @@ -126,8 +126,14 @@ __pl_inode_is_empty
                  (pl_inode_t *pl_inode)
                                 if (!list_empty
                  (&dom->entrylk_list))
                                         is_empty = 0;
                 
                +                if (!list_empty
                  (&dom->blocked_entrylks))
                +                        is_empty = 0;
                +
                                 if (!list_empty
                  (&dom->inodelk_list))
                                         is_empty = 0;
                +
                +                if (!list_empty
                  (&dom->blocked_inodelks))
                +                        is_empty = 0;
                         }
                 
                
                           return is_empty;
                @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t
                  *this, pl_inode_t *pl_inode)
                         struct list_head granted_list;
                         posix_lock_t     *tmp = NULL;
                         posix_lock_t     *lock = NULL;
                +       inode_t *unref = NULL;
                 
                         INIT_LIST_HEAD (&granted_list);
                 
                         pthread_mutex_lock
                  (&pl_inode->mutex);
                         {
                                 __grant_blocked_locks (this,
                  pl_inode, &granted_list);
                +
                +               if (__pl_inode_is_empty (pl_inode)
                  && pl_inode->refkeeper) {
                
                  +                       unref =
                  pl_inode->refkeeper;
                +                       pl_inode->refkeeper =
                  NULL;
                +               }
                         }
                         pthread_mutex_unlock
                  (&pl_inode->mutex);
                 
                @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t
                  *this, pl_inode_t *pl_inode)
                                 GF_FREE (lock);
                         }
                 
                +       if (unref)
                +               inode_unref (unref);
                +
                         return;
                 }
              
              
              This should make pl_disconnect_cbk() pretty
                much race free w.r.t refkpeer. Thoughts?
            
          
    Lets say C1 is doing pl_inodelk_client_cleanup. After the second
    for-loop(All granted and blocked locks are out of the domain) if an
    unlock on the final granted lock on that inode from client C2
    completes, refkeeper would be set to NULL and unrefed leading to
    zero refs on that inode i.e. pl_forget will also happen. In 3rd
    for-loop pl_inode is already freed and leads to free'd memory access
    and will crash.

    
    Pranith

    
      _______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

    
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel