Re: [RFC][PATCH 00/13] overlayfs stable inodes

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 19 Apr 2017 17:46:16 +0300

On Wed, Apr 19, 2017 at 4:52 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> On Wed, Apr 19, 2017 at 12:37 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>> On Wed, Apr 19, 2017 at 12:16 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>>> On Tue, Apr 18, 2017 at 8:37 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>>>>
>>>> On Mon, Apr 17, 2017 at 2:59 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>>>> > Overlayfs inodes are considered unstable in several aspects,
>>>> > because on a copy up event:
>>>> > 1. st_ino can change
>>>> > 2. st_dev can change
>>>> > 3. hardlinks are broken
>>>> > 4. NFS handle would become stale
>>>> > 5. content of read-only file descriptor would become stale
>>>> >
>>>> > This patch set 'stabilizes' overlayfs inodes w.r.t. st_ino/st_dev
>>>> > and takes some big steps in the direction of stabilizing hardlinks
>>>> > and NFS handles.
>>>> >
>>>>
>>>> I realized I forgot to mention in the cover letter that stable inodes
>>>> are only available for the overlay configuration where all layers
>>>> are on the same underlying fs and that underlying fs support
>>>> NFS export (I think all eligible upper fs support NFS export anyway).
>>>
>>> Hmm,  we could keep inode numbers stable across copy up even if layers
>>> are on different filesystems:  just need to use a separate st_dev for
>>> lower layers and keep st_dev and st_inode constant.  The only extra
>>> thing needed compared to the samefs case is the allocation of dummy
>>> device numbers for lower layers.  Of course "find -xdev" and the like
>>> still won't work properly, and we wouldn't be able to provide sane
>>> d_ino values in readdir.
>>
>> Not sure that is going to be worth the effort, but we'll see.
>> Anyway, not sure if you already read far enough into the series,
>> but the fact that overlay inodes are hashed by stable inode ino
>> helps solving a lot of the problems with minimal code changes,
>> so in the grand scheme of things, I think it would be easier to
>> say: same_fs can give you POSIX. non same_fs cannot.
>
> The effort should be small, and the reward is substantially less weird behavior.
>
> In fact I looked and SUSv4
> (http://pubs.opengroup.org/onlinepubs/9699919799/) only talks about
> "mount point" in the context of directories.  It does *not* require
> st_dev to be the as the st_dev of the parent directory for
> non-directories.  The only requirement is that st_ino and st_dev
> together uniquely identify a file in the system, which is why we need
> to generate a dummy st_dev for lower files in this case.  It also
> explicitly only talks about directories in the context of "-xdev" and
> the like.
>

I did see that 'find' does list files from overlay which have differnt
st_dev than parent, but 'du -x' does not count the files, which should
be very annoying to users. I'm surprised I haven't heard about this.

> So even in the non-samefs case we could stamp it with "POSIX
> compliant" because strictly speaking it is.
>
> If that's not enough, I think we *can* do unified ino space even in
> most non-samefs cases.  And here's why: look at the inode numbers of
> any filesystem; they will always be "small" so we can just partition
> the 64 bit ino space between layers and map inode numbers into its own
> partition.  This does not work in the general case, and it is a hack.
> But it's a very simple hack and it probably works fine. Similar thing
> is assumed by the 32bit compat code, which just returns EOVERFLOW if
> the ino happens to be too large, which I guess doesn't happen too
> often for most filesystems...
>

Well, if you are lucky you can run into a filesystem that exports
a file handle of type FILEID_INO32_GEN, then you *know* you're
good to go. ext* will do that and xfs that was forever mounted with
-o inode32.
Even with xfs -o inode64, it will not use the MSB ino bits unless
you are in the exabytes fs sizes.

Anyway, I will keep that in the back on my mind when working
on stable inode to keep the implementation open for such
improvements in the future.

Amir.