Re: [PATCH] fs/super.c: Fix lock/unlock imbalance in sget_fc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Al,

Certainly, I never checked grab_super. Lesson learned.

Thanks a lot for taking the time to write this master class. I really appreciate it. :)

--
Gustavo

a reproducer), we have some other crap going on and need to investigate
that, but even in that case, the patch is wrong]

As for how to investigate that kind of thing...  Look:

The code in question is
                         if (fc->user_ns != old->s_user_ns) {
                                 spin_unlock(&sb_lock);
                                 if (s) {
                                         up_write(&s->s_umount);
                                         destroy_unused_super(s);
                                 }
                                 return ERR_PTR(-EBUSY);
                         }
                         if (!grab_super(old))
                                 goto retry;
                         if (s) {
                                 up_write(&s->s_umount);
                                 destroy_unused_super(s);
                                 s = NULL;
                         }
                         return old;

Your hypothesis is that we can get to that return old; with sb_lock
held.  That would almost certainly be a bad thing, since elsewhere
in the same function we have
         spin_unlock(&sb_lock);
         get_filesystem(s->s_type);
         register_shrinker(&s->s_shrink);
         return s;
which appears to return an object with sb_lock dropped, with no obvious
way for a caller to tell one from another.  Even if such a way existed
(it does, actually), that kind of calling conventions would be highly
bug-prone.

The next question is, when would we get to that return old; with
sb_lock held?  We do, apparently, hold it before the if (fc->...)
above (again, strictly speaking not proven yet, but that's the
most likely assumption).  So if grab_super(old) returns true and
we are holding sb_lock, either we do have a problem, or something
subtle is going on.

The obvious next target of examination is grab_super().  Which comes
with
/**
  *      grab_super - acquire an active reference
  *      @s: reference we are trying to make active
  *
  *      Tries to acquire an active reference.  grab_super() is used when we
  *      had just found a superblock in super_blocks or fs_type->fs_supers
  *      and want to turn it into a full-blown active reference.  grab_super()
                                                                  ^^^^^^^^^^^^
  *      is called with sb_lock held and drops it.  Returns 1 in case of
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  *      success, 0 if we had failed (superblock contents was already dead or
  *      dying when grab_super() had been called).  Note that this is only
  *      called for superblocks not in rundown mode (== ones still on ->fs_supers
  *      of their type), so increment of ->s_count is OK here.
  */
and looking for references to sb_lock yields the underscored sentence.  Now,
if that is true (which is not guaranteed - comments can become stale), we do
not need to drop sb_lock after the call of grab_super() - it's already been
dropped by grab_super() itself.

And looking at the actual code we have
static int grab_super(struct super_block *s) __releases(sb_lock)
{
         s->s_count++;
         spin_unlock(&sb_lock);
	^^^^^^^^^^^^^^^^^^^^^---------- dropped, indeed
         down_write(&s->s_umount);
         if ((s->s_flags & SB_BORN) && atomic_inc_not_zero(&s->s_active)) {
                 put_super(s);
                 return 1;
         }
         up_write(&s->s_umount);
         put_super(s);
         return 0;
}
... and not regained, unless put_super() does something fishy.
static void put_super(struct super_block *sb)
{
         spin_lock(&sb_lock);
         __put_super(sb);
         spin_unlock(&sb_lock);
}
OK, put_super() definitely returns with sb_lock not held, and therefore so does
grab_super().  In other words, the comment does match the reality and trying
to drop sb_lock right after the call of grab_super() would be 100% wrong.

That disproves your hypothesis.  For the sake of completeness, let's finish the
analysis of sget_fc() wrt sb_lock.
struct super_block *sget_fc(struct fs_context *fc,
                             int (*test)(struct super_block *, struct fs_context *),
                             int (*set)(struct super_block *, struct fs_context *))
{
         if (!(fc->sb_flags & SB_KERNMOUNT) &&
             fc->purpose != FS_CONTEXT_FOR_SUBMOUNT) {
                 /* Don't allow mounting unless the caller has CAP_SYS_ADMIN
                  * over the namespace.
                  */
                 if (!(fc->fs_type->fs_flags & FS_USERNS_MOUNT) &&
                     !capable(CAP_SYS_ADMIN))
                         return ERR_PTR(-EPERM);
                 else if (!ns_capable(fc->user_ns, CAP_SYS_ADMIN))
                         return ERR_PTR(-EPERM);
         }
retry:
         spin_lock(&sb_lock);

OK, we definitely do not want to call that with sb_lock held - doing so would
either return ERR_PTR(-EPERM) or deadlock.  So the calling conventions include
"caller is not holding sb_lock".  If so, everything up to retry: should be
executed without sb_lock held, and subsequent code is with sb_lock held.
         if (test) {
                 hlist_for_each_entry(old, &fc->fs_type->fs_supers, s_instances) {
                         if (!test(old, fc))
                                 continue;
'test' callback should be callable with sb_lock held.  Note that
at least in case when it returns false it must not have dropped
sb_lock - the list we are walking is protected by sb_lock.
                         if (fc->user_ns != old->s_user_ns) {
                                 spin_unlock(&sb_lock);
... in which case we also want sb_lock not dropped by test() since we
drop it ourselves.
                                 if (s) {
                                         up_write(&s->s_umount);
                                         destroy_unused_super(s);
destroy_unused_super() is called without sb_lock here and examination
shows that it doesn't touch sb_lock itself.
                                 }
                                 return ERR_PTR(-EBUSY);
                         }
... and in this case we also want sb_lock not dropped by test() either,
since grab_super() will drop it.
                         if (!grab_super(old))
                                 goto retry;
we either go back to 'retry:' with sb_lock not held (same as in the
case of reaching retry: without goto) or coninue to
                         if (s) {
                                 up_write(&s->s_umount);
                                 destroy_unused_super(s);
same as above, called without sb_lock, doesn't touch it.
                                 s = NULL;
                         }
                         return old;
... and we return without sb_lock held.
                 }
         }

Here (after the if (test) body) we do hold sb_lock
         if (!s) {
                 spin_unlock(&sb_lock);
drop and call alloc_super(), which doesn't touch sb_lock
                 s = alloc_super(fc);
                 if (!s)
                         return ERR_PTR(-ENOMEM);
                 goto retry;
... and either return without sb_lock or go back to retry:, with
the same conditions as on other paths leading there.  Incidentally,
since alloc_super() very clearly blocks (GFP_USER kzalloc the very
first thing in there), the calling conventions for sget_fc() include
not just "must not be holding sb_lock" but "must not be holding any
spinlock".
         }
         s->s_fs_info = fc->s_fs_info;
         err = set(s, fc);
OK, so 'set()' is also called under sb_lock.
         if (err) {
                 s->s_fs_info = NULL;
                 spin_unlock(&sb_lock);
... and at least in case of error must not drop it, since we'd do that
ourselves.
                 up_write(&s->s_umount);
                 destroy_unused_super(s);
                 return ERR_PTR(err);
same as in earlier cases
         }
         fc->s_fs_info = NULL;
         s->s_type = fc->fs_type;
         strlcpy(s->s_id, s->s_type->name, sizeof(s->s_id));
         list_add_tail(&s->s_list, &super_blocks);
         hlist_add_head(&s->s_instances, &s->s_type->fs_supers);
         spin_unlock(&sb_lock);

OK, so 'set()' must not drop sb_lock in any cases.  And from that point
on sb_lock is not held (neither get_filesystem() nor register_shrinker()
touch it)

         get_filesystem(s->s_type);
         register_shrinker(&s->s_shrink);
         return s;
}

So we arrive to the following:

* sget_fc() must not be called with any spinlocks (sb_lock included) held.
* in all cases it returns with sb_lock not held.
* test() and set() callbacks are always called under sb_lock and should not
drop it.

Looking at the shape of that code strengthens the last one to "even
drop-and-retake is not allowed".  With that kind of loop over hlist, dropping
and retaking sb_lock in test() might blow up.  And as for set() callback,
we clearly don't want to create a new instance when an existing one would
satisfy the test() predicate.  And dropping/retaking sb_lock would've
allowed another caller to come and insert a new instance while our
set() has not been holding sb_lock, ending up with just that once we
return and get to hlist_add_head() there.

In other words,
	* called without any spinlocks held
	* returns with no spinlocks held
	* callbacks are always called under sb_lock and must not touch it.

Verifying that callers (and all possible callbacks) satisfy those rules
is left as an exercise for reader...




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux