Re: [PATCH v4 00/32] Lockfile correctness and refactoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/07/2014 04:21 PM, Torsten Bögershausen wrote:
> 
> On 2014-09-06 09.50, Michael Haggerty wrote:
>> Sorry for the long delay since v3. This version mostly cleans up a
>> couple more places where the lockfile object was left in an
>> ill-defined state. 
> No problem with the delay.
> The most important question is if we do the lk->active handling right.
> Set it to false as seen as possible, and to true as late as possible,
> then die() cleanly.
> So the ->acive handling looks (more or less, please see below) and
> deserves another critical review, may be.
> 
> Instead of commenting each patch, I collected a mixture of small questions
> and possible suggestions into a diff file.
> 
> diff --git a/lockfile.c b/lockfile.c
> index e54d260..7f27ea2 100644
> --- a/lockfile.c
> +++ b/lockfile.c
> @@ -18,6 +18,10 @@
>   *    Usually, if $FILENAME is a symlink, then it is resolved, and the
>   *    file ultimately pointed to is the one that is locked and later
>   *    replaced.  However, if LOCK_NODEREF is used, then $FILENAME
> +LOCK_NODEREF can be read either as
> +LOCK_NODE_REF or LOCK_NO_DEREF
> +should we change it ?
> +

Good idea.

>   *    itself is locked and later replaced, even if it is a symlink.
>   *
>   * 2. Write the new file contents to the lockfile.
> @@ -46,9 +50,18 @@
>   *   state:
>   *   - the lockfile exists
>   *   - active is set
> - *   - filename holds the filename of the lockfile
> + *   - filename holds the filename of the lockfile in a strbuf

I don't think this is necessary. The point of this list is to describe
the state machine, not the contents of the lock_file structure, so this
detail is only a distraction. And even if a reader is confused, the
compiler will warn if he tries to use the strbuf as if it were a string.

>   *   - fd holds a file descriptor open for writing to the lockfile
>   *   - owner holds the PID of the process that locked the file
> +question: Why do we need the PID here ?
> +Do we open a lock file and do a fork() ?
> +And if yes, the child gets a new PID, what happens when the
> +child gets a signal ?
> +Who "owns" the lockfile, the parent, the child, both ?
> +The child has access to all data, the fd is open and can be used,
> +why do we not allow a rollback, when the child dies ?

Good questions. I will add an explanation of the purpose of the pid in
this docstring.

>   *
>   * - Locked, lockfile closed (after close_lock_file()).  Same as the
>   *   previous state, except that the lockfile is closed and fd is -1.
> @@ -57,7 +70,7 @@
>   *   rollback_lock_file(), or a failed attempt to lock).  In this
>   *   state:
>   *   - active is unset
> - *   - filename is the empty string (usually, though there are
> + *   - filename is an empty string buffer (usually, though there are
>   *     transitory states in which this condition doesn't hold)
>   *   - fd is -1
>   *   - the object is left registered in the lock_file_list, and
> @@ -68,7 +81,7 @@
>  
>  static struct lock_file *volatile lock_file_list;
>  
> -static void remove_lock_file(void)
> +static void remove_lock_files(void)

Good idea. I will rename these two functions.

>  {
>      pid_t me = getpid();
>  
> @@ -79,9 +92,9 @@ static void remove_lock_file(void)
>      }
>  }
>  
> -static void remove_lock_file_on_signal(int signo)
> +static void remove_lock_files_on_signal(int signo)
>  {
> -    remove_lock_file();
> +    remove_lock_files();
>      sigchain_pop(signo);
>      raise(signo);
>  }
> @@ -93,7 +106,7 @@ static void remove_lock_file_on_signal(int signo)
>   * "/", if any).  If path is empty or the root directory ("/"), set
>   * path to the empty string.
>   */
> -static void trim_last_path_elm(struct strbuf *path)
> +static void trim_last_path_elem(struct strbuf *path)

I agree that the old name was bad. I will make it even more explicit:
trim_last_path_component().

>  {
>      int i = path->len;
>  
> @@ -143,7 +156,7 @@ static void resolve_symlink(struct strbuf *path)
>               * link is a relative path, so replace the
>               * last element of p with it.
>               */
> -            trim_last_path_elm(path);
> +            trim_last_path_elem(path);
>  
>          strbuf_addbuf(path, &link);
>      }
> @@ -153,13 +166,16 @@ static void resolve_symlink(struct strbuf *path)
>  /* Make sure errno contains a meaningful value on error */
>  static int lock_file(struct lock_file *lk, const char *path, int flags)
>  {
> +    struct stat st;
> +    int mode = 0666;
>      if (!lock_file_list) {
>          /* One-time initialization */
> -        sigchain_push_common(remove_lock_file_on_signal);
> -        atexit(remove_lock_file);
> +        sigchain_push_common(remove_lock_files_on_signal);
> +        atexit(remove_lock_files);
>      }
>  
> -    assert(!lk->active);
> +  if (lk->active)
> +        die("lk->active %s", path);

OK, but I will use die("BUG:...") since this would be an indication of a
bug in git.

>      if (!lk->on_list) {
>          /* Initialize *lk and add it to lock_file_list: */
> @@ -167,16 +183,25 @@ static int lock_file(struct lock_file *lk, const char *path, int flags)
>          lk->active = 0;
>          lk->owner = 0;
>          lk->on_list = 1;
> -        strbuf_init(&lk->filename, 0);
> +        strbuf_init(&lk->filename, strlen(path) + LOCK_SUFFIX_LEN);

OK.

>          lk->next = lock_file_list;
>          lock_file_list = lk;
>      }
>  
> +    strbuf_reset(&lk->filename); /* Better to be save */

I agree, but this would be a bug so I'd rather die("BUG:") in this case.

>      strbuf_addstr(&lk->filename, path);
>      if (!(flags & LOCK_NODEREF))
>          resolve_symlink(&lk->filename);
>      strbuf_addstr(&lk->filename, LOCK_SUFFIX);
> -    lk->fd = open(lk->filename.buf, O_RDWR | O_CREAT | O_EXCL, 0666);
> +    /*
> +     * adjust_shared_perm() will widen permissions if needed,
> +     * otherwise keep permissions restrictive
> +     *
> +     */
> +    if (!stat(path, &st))
> +        mode = st.st_mode & 07777;
> +
> +    lk->fd = open(lk->filename.buf, O_RDWR | O_CREAT | O_EXCL, mode);

I think this change is separate from the changes made in this series,
and needs its own justification (i.e., it's not immediately obvious to
me whether the change is an improvement). Would you mind submitting it
as a separate patch?

>      if (lk->fd < 0) {
>          strbuf_reset(&lk->filename);
>          return -1;
> @@ -268,7 +293,7 @@ int close_lock_file(struct lock_file *lk)
>      return close(fd);
>  }
>  
> -int reopen_lock_file(struct lock_file *lk)
> +int reopen_lock_file_UNUSED_CAN_IT_BE_REMOVED(struct lock_file *lk)

Junio added this function recently, without any callers, in 93dcaea2. I
assume he has some diabolical plan for it. Junio?

>  {
>      if (0 <= lk->fd)
>          die(_("BUG: reopen a lockfile that is still open"));
> @@ -283,7 +308,7 @@ int commit_lock_file_to(struct lock_file *lk, const char *path)
>      int save_errno;
>  
>      if (!lk->active)
> -        die("BUG: attempt to commit unlocked object");
> +        die("BUG: attempt to commit unlocked object %s", path);

OK.

>  
>      if (lk->fd >= 0 && close_lock_file(lk))
>          goto rollback;
> @@ -325,10 +350,12 @@ void rollback_lock_file(struct lock_file *lk)
>  {
>      if (!lk->active)
>          return;
> +    lk->active = 0; /* We are going to de-activate,
> +                       so active is no longer valid already here ? */

I think the question is: what should happen if a signal arrives at this
moment? If we set active=0 here, then the signal handler wouldn't clean
up this lockfile at all. But we haven't removed it yet, either. So the
lockfile would get left behind.

So how much longer can we leave active set here?

>  
>      if (lk->fd >= 0)
>          close_lock_file(lk);

close_lock_file() is careful to clear lk->fd *before* closing it, so it
shouldn't be possible for the file to get closed twice [1]. Therefore I
think it is OK to leave lk->active set during the call to close_lock_file().

>      unlink_or_warn(lk->filename.buf);

This line is a bit dangerous. If we call unlink_or_warn() with
lk->active set, then there is a chance that unlink_and_warn() will be
called a second time by a signal handler that runs after this line but
before lk->active is cleared. The first call would delete the lockfile
that we created, but the second call (which might be delayed for a bit
while our signal handler works its way through the lockfile linked-list)
might end up deleting a lockfile that another process created between
our two calls. That would be a bad outcome, but it requires a double
coincidence: our process has to receive a signal at this pessimal moment
*and* another process has to create a lockfile after that event but
before our signal handler has run. And for *real* damage to occur, a
*third* process has to incorrectly acquire the *same* lockfile because
of its incorrect deletion by our process, and it has to make a change
that conflicts with the change that the second process is trying to make.

On the other hand, if we clear lk->active *before* calling
unlink_or_warn(), then the danger is that we receive a signal and never
clean up the lockfile. The result is perhaps not as bad as deleting
another process's lockfile, but it seems to me that it is far more
likely: it can happen regardless of whether another process is racing
with us, let alone a third process is trying to acquire the same lock.

Can anybody think of a reasonable way to make this 100% safe?

If not, I think the code as written is safer than your proposed change.

> -    lk->active = 0;
> +    //lk->active = 0;
>      strbuf_reset(&lk->filename);
>  }
> 
> 

Thanks very much for your feedback!

Michael

[1] Another question (and I think somebody else brought it up) is
whether close_lock_file() should actually close the file *before*
clearing lk->fd. I will address that question elsewhere.

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]