Re: Overlayfs snapshots pre announce

Amir Goldstein <amir73il@xxxxxxxxx> · Tue, 6 Dec 2016 13:14:02 +0200

On Tue, Dec 6, 2016 at 12:41 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> On Mon, Dec 5, 2016 at 2:39 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
>> FYI, I implemented ./run --sn=N test
>> for recycling N snapshot as described above.
>>
>> 3 directory rename tests (rename-move-dir, rename-pop-dir,
>> rename-new-pop-dir) fail on snapshot consistency after 1 recycle
>> due to a problem I failed to anticipate - redirect by file handle skips
>> all middle layers and goes directly to master.
>
> Not sure I understand.   How does redirect/fh skip middle layers?  We
> need to merge contents of all layers, so skipping middle layers does
> not sound correct.
>

OK. it's easy enough to understand it if you realize why snapshots
MUST use redirect_fh and not as optimization.

With standard overlay, when you rename a dir in upper you must set
redirect (path and/or fh) *before* the first rename and never have to
deal with it on subsequent renames.

With reverse overlay, the same trick won't work. It is not feasible
to update redirect (by path) on upper dir on every rename of lower dir.
And there is no way to do it atomically with the lower rename.
So redirect_fh comes to the rescue. upper dir gets the lower fh
*before* the first lower dir rename. Then all subsequent lower dir
renames are invariant to the stored fh. snapshot lookup can always
find lower dir by fh, regardless on how many times it was renamed.
Removal of lower dir, BTW, will result in ESTALE which translates to
ENOENT in layers lookup, so result is pure upper dir for snapshot.

All this works quite well for a single snapshot, but breaks with the
second snapshot. upper/0 is rotated to the top level of the snapshots
stack and upper/1 becomes the 'master' snapshot.
Subsequent renames of lower dir will trigger copy up to upper/1 with
storing of lower dir fh. But the new directory created in upper/1 is not
referenced by fh from upper/0. The lower directory is.

As for solution. *If* we had an efficient way to lookup
"which inode in this layer represents inode X in lower layer",
then we would have a solution for:
1. Reverse redirect dir (lookup the inode in upper/1 which represents
    the lower dir inode that we have from fh stored in upper/0)
2. Hardlinks copy up (did we already copy up another alias of inode X
    to this layer)
3. Stable file handles for NFS export (lookup the upper most
    representation of lower inode X)
4. Optimized readdir d_ino (requires the opposite map "which inode
    in lower is the base for inode Y in this layer).

So as far as I can tell, all the problem above are bound together
by the need for inode map indexing. Not sure if you think that's a good
idea, whether you can think of another idea or what's the best way
to implement such a map would be.

As usual, hope I was able to clarify rather then complicate further..
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html