On Wed, Apr 19, 2017 at 12:37 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > On Wed, Apr 19, 2017 at 12:16 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >> On Tue, Apr 18, 2017 at 8:37 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>> >>> On Mon, Apr 17, 2017 at 2:59 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>> > Overlayfs inodes are considered unstable in several aspects, >>> > because on a copy up event: >>> > 1. st_ino can change >>> > 2. st_dev can change >>> > 3. hardlinks are broken >>> > 4. NFS handle would become stale >>> > 5. content of read-only file descriptor would become stale >>> > >>> > This patch set 'stabilizes' overlayfs inodes w.r.t. st_ino/st_dev >>> > and takes some big steps in the direction of stabilizing hardlinks >>> > and NFS handles. >>> > >>> >>> I realized I forgot to mention in the cover letter that stable inodes >>> are only available for the overlay configuration where all layers >>> are on the same underlying fs and that underlying fs support >>> NFS export (I think all eligible upper fs support NFS export anyway). >> >> Hmm, we could keep inode numbers stable across copy up even if layers >> are on different filesystems: just need to use a separate st_dev for >> lower layers and keep st_dev and st_inode constant. The only extra >> thing needed compared to the samefs case is the allocation of dummy >> device numbers for lower layers. Of course "find -xdev" and the like >> still won't work properly, and we wouldn't be able to provide sane >> d_ino values in readdir. > > Not sure that is going to be worth the effort, but we'll see. > Anyway, not sure if you already read far enough into the series, > but the fact that overlay inodes are hashed by stable inode ino > helps solving a lot of the problems with minimal code changes, > so in the grand scheme of things, I think it would be easier to > say: same_fs can give you POSIX. non same_fs cannot. The effort should be small, and the reward is substantially less weird behavior. In fact I looked and SUSv4 (http://pubs.opengroup.org/onlinepubs/9699919799/) only talks about "mount point" in the context of directories. It does *not* require st_dev to be the as the st_dev of the parent directory for non-directories. The only requirement is that st_ino and st_dev together uniquely identify a file in the system, which is why we need to generate a dummy st_dev for lower files in this case. It also explicitly only talks about directories in the context of "-xdev" and the like. So even in the non-samefs case we could stamp it with "POSIX compliant" because strictly speaking it is. If that's not enough, I think we *can* do unified ino space even in most non-samefs cases. And here's why: look at the inode numbers of any filesystem; they will always be "small" so we can just partition the 64 bit ino space between layers and map inode numbers into its own partition. This does not work in the general case, and it is a hack. But it's a very simple hack and it probably works fine. Similar thing is assumed by the 32bit compat code, which just returns EOVERFLOW if the ino happens to be too large, which I guess doesn't happen too often for most filesystems... Thanks, Miklos