On Thu, Apr 27, 2017 at 3:53 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > On Thu, Apr 27, 2017 at 4:11 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >> - hardlinks: need to set overlay.redirect when hardlink is created >> from a copied up file; similar to what we do on rename >> > > Partly agree. > 1. This is not atomic, because hardlink is not created in workdir. Doesn't matter; we can set overlay.redirect before we do the hardlink. If the hardlink fails, we are left with the redirect, but that's not a problem. > 2. Reverse mapping will take care of this anyway. Remember? > there is an extra hardlink in workdir/inodes with has overlay.redirect > set on first alias copy up Yeah, but I don't really see why we'd need to set overlay.redirect on copy-up. When reverse mapping (i.e. trying to reconstruct the overlay dentry from the lower fh) we'll just do the same thing as for normal lookup: use the path from the upper root to the dentry and look up each component in the lower layers (taking into account any overlay.redirect encountered). >> - for non-redirect case looking up by overlay.origin is almost surely >> a pessimization >> > > Not sure about that. > For directories by fh and by name are probably on par - > At most one lookup of ".." compared to one lookup of d.name. > > For non-dir there is a better way that is better than both (see below). Not that simple (see below). > So I managed to confuse myself about the technical facts of decoding, > because I was used to dealing only with decoding of dir handles > (for snapshots) and just now added decoding of non-dir. > > When decoding a directory handle, it is always being connected up to > root. It sounds harsh, but in fact I think it will always be faster or on par > with regular lookup by path, because: > When looking up backwards until the first connected ancestor, filesystem > always has to read the ".." entry. > When looking up forward by path, then filesystem need to read entries > from connected ancestor by name and that is most likely indexed only > worse then the ".." entry of the backwards lookup. Problem is there's more going on than just lookup of "..". In fact it *must* entail the lookup of "name" as well, because that's the way the dentry gets connected. There's an even bigger snag: where do we get the name? There's a ->get_name() export op, but most fs don't define it, and the default action is to iterate the parent dir and find the one matching our inum. There goes the performance... That's why I'm saying it's almost certainly will be slower. Exception might be the cached case, but even there lookup by inum might be slower than the super optimized cached path lookup of a few filenames. Since we are looking up the overlay dentry, which isn't cached at this point, so why would the lower ones be? > When decoding directories you also want to get a connected dentry and > verifying is_subdir() makes sense. > > HOWEVER, and this is big thing that I missed, when decoding a non-dir > we DON'T need to get a connected dentry. > It's perfectly fine to get a disconnected alias and getting a disconnected > alias is always O(1) for the filesystem. It's O(1) but so is a single component lookup (case without overlay.redirect). In the cold cache cache both will be slow, since most of the time will be spent on getting the inode from disk. In the hot cache case, odds are the name lookup will win, since it's the more optimized codepath... > The only thing we really need from this alias is to know its inode number > (and to know that it is still valid). > > So if we encode non-connectable fh for non-dir (like knfsd does by default) > then: > 1. decoding them will always be faster then any other lookup method > 2. we cannot verify is_subdir() - so what? > > What's the worse thing that can happen if the decoded entry is not under > the layer anymore? We only use its inode number, and the only thing we > need to know is that it is unique within the lower layers inode namespace > and we don't need is_subdir() for that. Okay. > > But I just realized something very very bad about non-samefs case. > We must use made up st_dev for lower layers, we can certainly no > longer use the real lower st_dev. > If we do, then we will have 2 files in 2 different overlay mounts, > who have the same lower inode but 2 different upper inodes with > different content and those 2 overlay files will have the same > st_dev/st_ino. Yeah, I told you about this issue a couple of mails back. > > I just found that in my debian based kvm-xfstests machine, diff > reports 2 broken hardlinks with different content as equal, because > they have the same st_dev/st_ino. > > So in conclusion: > 1. Encode non-connectable file handles to non-dir Fine by me. > 2. Always try to lookup non-dir by fh first - it's O(1) Well... for simplicity sake, okay. Probably not a big loss. > 3. Non-samefs needs fake st_dev before reporting constant inodes Yes. > 4. Broken hardlinks should NOT report same inode Yes. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html