Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 17, 2018 at 5:34 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> On Wed, Jan 17, 2018 at 5:42 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>> On Wed, Jan 17, 2018 at 12:18 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>>> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>>>>
>>> [...]
>>>> >>
>>>> >> So, a working algorithm would be going up to the first connected
>>>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>>>> >> to finish, since not protected against always racing with renames.
>>>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>>>> >>
>>>> >
>>>> > Sounds like a simple and good enough solution.
>>>> > Do we really need the locking of parent and restart connect if
>>>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>>>
>>>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>>>> do better than that for decoding a file handle.
>>>>
>>>> And we probably don't need anything else, since rename on ancestor
>>>> means renamed dir is connected, and hopefully not evicted from the
>>>> cache until we repeat the walk up.
>>>>
>>>> So need to lock parent, lookup ovl dentry, verify we got the same
>>>> upper, if not retry icache lookup.
>>>>
>>>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>>>
>>>
>>> Something like this??
>>>
>>> This is just the raw fix to patch 4/17 without the icache lookup
>>> that is added by later patches.
>>>
>>> I added rename_lock seqlock around backwalk to connected ancestor
>>> and take_dentry_name_snapshot() for the stability of real name
>>> during overlay lookup.
>>>
>>> I considered also storing OVL_I(d_inode(connected))->version
>>> inside seqlock and comparing it to version in case lookup of child
>>> failed. This could help us distinguish between overlay rename and
>>> underlying rename (overlay dir version did not change) and return
>>> ESTALE instead of restarting lookup in the latter case.
>>> Wasn't sure if that was a good idea and what we loose if we leave it out.
>>>
>>> I tested this code, but only with upper file handles of course
>>> (xfstest generic/467).
>>>
>>> Please let me know what you think.
>>>
>>> Thanks,
>>> Amir.
>>>
>>> ================================================================
>>>
>>> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
>>> From: Amir Goldstein <amir73il@xxxxxxxxx>
>>> Date: Thu, 28 Dec 2017 18:36:16 +0200
>>> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>>>
>>> Until this change, we decoded upper file handles by instantiating an
>>> overlay dentry from the real upper dentry. This is sufficient to handle
>>> pure upper files, but insufficient to handle merge/impure dirs.
>>>
>>> To that end, if decoded real upper dir is connected and hashed, we
>>> lookup an overlay dentry with the same path as the real upper dir.
>>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>>> dentry as before this change.
>>>
>>> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
>>> exportfs never needs to call get_parent() and get_name() to reconnect an
>>> upper overlay dir. Because connectable non-dir file handles are not
>>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>>> methods get_parent() and get_name() are implemented just to print out a
>>> sanity warning and the method fh_to_parent() is implemented to warn the
>>> user that using the 'subtree_check' exportfs option is not supported.
>>>
>>> An alternative approach could have been to implement instantiating of
>>> an overlay directory inode from origin/index and implement get_parent()
>>> and get_name() by calling into underlying fs operations and them
>>> instantiating the overlay parent dir.
>>>
>>> The reasons for not choosing the get_parent() approach were:
>>> - Obtaining a disconnected overlay dir dentry would requires a
>>>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>>>   parent info. It was preferred to avoid doing that re-factoring unless
>>>   it was proven worthy.
>>> - Going down the path of disconnected dir would mean that the (non
>>>   trivial) code path of d_splice_alias() could be traveled and that
>>>   meant writing more tests and introduces race cases that are very hard
>>>   to hit on purpose. Taking the path of connecting overlay dentry by
>>>   forward lookup is therefore the safe and boring way to avoid surprises.
>>>
>>> The culprit of the chosen "connected overlay dentry" approach:
>>> - We need to take special care to rename of ancestors while connecting
>>>   the overlay dentry by real dentry path. These subtleties are usually
>>>   handled by generic exportfs and VFS code.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
>>> ---
>>>  fs/overlayfs/export.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 214 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>> index 557c29928e98..35f37a72d55e 100644
>>> --- a/fs/overlayfs/export.c
>>> +++ b/fs/overlayfs/export.c
>>> @@ -130,6 +130,188 @@ static struct dentry *ovl_obtain_alias(struct
>>> super_block *sb,
>>>         return dentry;
>>>  }
>>>
>>> +/*
>>> + * Lookup a child overlay dentry whose real dentry is @real.
>>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>>> + * name as the real dentry. Otherwise, we need to consult index for lookup.
>>> + */
>>> +static struct dentry *ovl_lookup_real_one(struct dentry *parent,
>>> +                                         struct dentry *real,
>>> +                                         struct ovl_layer *layer)
>>> +{
>>> +       struct dentry *this;
>>> +       struct name_snapshot name;
>>> +       int err;
>>> +
>>> +       /* TODO: use index when looking up by lower real dentry */
>>> +       if (layer->idx)
>>> +               return ERR_PTR(-EACCES);
>>> +
>>> +       /*
>>> +        * Lookup overlay dentry by real name. The parent mutex protects us
>>> +        * from racing with overlay rename. If the overlay dentry that is
>>> +        * above real has already been moved to a different parent, then this
>>> +        * lookup will fail to find a child dentry whose real dentry is @real
>>> +        * and we will have to restart the lookup of real path from the top.
>>> +        *
>>> +        * We also need to take a snapshot of real dentry name to protect us
>>> +        * from racing with underlying layer rename. In this case, we don't
>>> +        * care about returning ESTALE, only from referencing a free name
>>> +        * pointer.
>>> +        *
>>> +        * TODO: try to lookup the renamed overlay dentry in inode cache by
>>> +        *       real inode.
>>> +        */
>>> +       inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
>>> +       take_dentry_name_snapshot(&name, real);
>>
>> No need to snapshot, just check if parent hasn't changed after
>> locking.  If parent is same, then name is guaranteed to be stable.
>>
>
> I don't understand.
> We are not holding a lock on real parent, only on overlay parent.
> What makes the real name stable?
> The snapshot is not to protect from racing with overlay rename.
> The snapshot is for protecting from race with real rename, just to
> make sure we don't dereference a stale name pointer.

Ah,okay.

>
>> This also means, that only ESTALE is possible after this.   And ESTALE
>> is fatal, no need to retry after that.
>
> OK. I will return ECHILD for parent that has changed
> and will restart only on ECHILD.
>
>>
>>> +       this = lookup_one_len(name.name, parent, strlen(name.name));
>>> +       err = PTR_ERR(this);
>>> +       if (IS_ERR(this)) {
>>> +               goto fail;
>>> +       } else if (!this || !this->d_inode) {
>>> +               dput(this);
>>> +               err = -ENOENT;
>>> +               goto fail;
>>> +       } else if (ovl_dentry_upper(this) != real) {
>>> +               dput(this);
>>> +               err = -ESTALE;
>>> +               goto fail;
>>> +       }
>>> +
>>> +out:
>>> +       release_dentry_name_snapshot(&name);
>>> +       inode_unlock(d_inode(parent));
>>> +       return this;
>>> +
>>> +fail:
>>> +       pr_warn_ratelimited("overlayfs: failed to lookup one by real
>>> (%pd2, layer=%d, parent=%pd2, err=%i)\n",
>>> +                           real, layer->idx, parent, err);
>>> +       this = ERR_PTR(err);
>>> +       goto out;
>>> +}
>>> +
>>> +/*
>>> + * Lookup an overlay dentry whose real dentry is @real.
>>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>>> + * path the real dentry. Otherwise, we need to consult index for lookup.
>>> + */
>>> +static struct dentry *ovl_lookup_real(struct super_block *sb,
>>> +                                     struct dentry *real,
>>> +                                     struct ovl_layer *layer)
>>> +{
>>> +       struct dentry *connected;
>>> +       unsigned int seq;
>>> +       int err = 0;
>>> +
>>> +       /* TODO: use index when looking up by lower real dentry */
>>> +       if (layer->idx)
>>> +               return ERR_PTR(-EACCES);
>>> +
>>> +       connected = dget(sb->s_root);
>>> +       while (!err) {
>>> +               struct dentry *next, *this;
>>> +               struct dentry *parent = NULL;
>>> +               struct dentry *real_connected = layer->mnt->mnt_root;
>>> +
>>> +               if (real_connected == real)
>>> +                       break;
>>
>> This loop will never finish, since real_connected is mnt_root now.
>> Would be nice if there was a guaranteed way to finish this without
>> icache lookup, but I don't see how.
>>
>
> That's a bug.
> The correct code is:
>
> struct dentry *real_connected = ovl_dentry_upper(connected);

No, that's still broken, because if something gets renamed below
"connected", we'll fall off root.  So need to stop at real_connected
AND at root of overlay.

>
>
>>> +
>>> +               /*
>>> +                * Find the topmost dentry not yet connected. Taking rename_lock
>>> +                * so at least we don't race with rename when walking back to
>>> +                * 'real_connected'.
>>> +                */
>>> +               seq = read_seqbegin(&rename_lock);
>>
>> I don't see what we gain with this.
>>
>
> I can't say that I do see it, but perhaps there is something yet
> to be gained by adding this later for lower layers lookup.
> Perhaps when looking on lower real layer, we can store the
> overlay dir cache version of 'connected' (connected in this case
> may be an indexed merge dir).
> After we take 'connected' dir mutex, we cannot check that
> real parent hasn't changes as an indication to no overlay rename
> because overlay rename happens on upper, but we can compare
> the dir cache version of 'connected' dir to the version we stored
> under rename_lock.
> Then we can tell if lower lookup has failed because of some
> permanent error (e.g. middle layer redirect) or because of an
> indexed rename, so we need to restart.
> Maybe that gains us something?

Sorry, couldn't follow that.

I don't see need for additional locking apart from the one in
ovl_lookup_by_real_one().  Because the only race remaining is eviction
of overlay inode(s) from the icache, and no locking is going to
prevent that.

To get a fully race-free version, we'd need to abandon returning a
connected dir from decode_fh().

Thanks,
Miklos



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux