Re: overlayfs: allowing for changes to lowerdir

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 9 Mar 2017 13:22:32 +0200

On Thu, Mar 9, 2017 at 12:37 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> On Tue, Feb 14, 2017 at 3:01 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@xxxxxxxxx> wrote:
>>> So here's the use case:  lowerdir is an NFS mounted root filesystem
>>> (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
>>> for writes to happen.  This works great with the caveat being I cannot
>>> make 'live' changes to the root filesystem, which poses the problem.
>>> Any access to a changed file causes a 'Stale file handle' error.
>>>
>>> With some experimenting, I've discovered that remounting the overlay
>>> filesystem (mount -o remount / /)  registers any changes that have
>>> been made to the lower NFS filesystem.  In addition, dumping cache
>>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
>>> go away and reads pass through to the lower dir and correctly show
>>> changes.
>>>
>>> I'd like to make this use case feasible by allowing changes to the NFS
>>> lowerdir to work more or less transparently.  It seems like if the
>>> overlay did not do any caching at all, all reads would fall through to
>>> either the upperdir ram disk or the NFS lower, which is precisely what
>>> I want.
>>>
>>> So, let me pose this somewhat naive question:  Would it be possible to
>>> simply disable any cacheing performed by the overlay to force all
>>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
>>> Would this be enough to accomplish my goal of being able to change the
>>> lowerdir of an active overlayfs?
>>>
>>
>> There is no need to disable caching. There is already a mechanism
>> in place in VFS to revalidate inode cache entries.
>> NFS implements d_revalidate() and overlayfs implements d_revalidate()
>> by calling into the lower fs d_revalidate().
>>
>> However overlayfs intentionally errors when lower entry has been modified.
>> (see: 7c03b5d ovl: allow distributed fs as lower layer)
>>
>> You can try this (untested) patch to revert this behavior, just to see if it
>> works for your use case, but it won't change this fact
>> from Documentation/filesystems/overlayfs.txt:
>> " Changes to the underlying filesystems while part of a mounted overlay
>> filesystem are not allowed.  If the underlying filesystem is changed,
>> the behavior of the overlay is undefined, though it will not result in
>> a crash or deadlock."
>
> Best way to keep things simple is to only add functionality when
> someone actually needs it (and can test it).  This has been the design
> policy in overlayfs and it worked wonderfully.
>
> So we could probably fix the undefined behavior in the above case to
> some extent.
>
>>
>> Specifically, renaming directories and files in lower that were already
>> copied up is going to have a weird outcome.
>>
>> Also, the situation with changing files in lower remote fs could be worse
>> than changing files on lower local fs, simply because right now, this
>> use case is not tested (i.e. it results in ESTALE).
>>
>> I believe that fixing this use case, if at all possible, would require quite
>> a bit of work, a lot of documentation (about expected behavior) and
>> even more testing.
>
> Well, your patch seems to be safe:  if remote fs says something
> changed, throw away node and subtree on the overlay level.
>
> We could introduce the same thing for local fs.  Just need to verify
> in .d_revalidate() that underlying dentry's parent and name matches
> overlay dentry's parent and name.  It's an overhead, and makes no
> sense in the case when we know the lower layers won't change, so it
> may be best to keep this check optional.
>
> Note, that overlay would still return ESTALE if the change on the
> lower layer happens on a dentry already looked up (e.g. cwd, open
> file, race of lookup with rename on underlying layer).  Same as NFS.
>

Naturally.

Could you explain what was the reason for special casing (ret == 0)
in ovl_dentry_revalidate()?