Amir Goldstein <amir73il@xxxxxxxxx> writes: >> > >> > Based on Alexander's explanation about the differences between overlayfs >> > lookup vs. composefs lookup of a regular "metacopy" file, I just need to >> > point out that the same optimization (lazy lookup of the lower data >> > file on open) >> > can be done in overlayfs as well. >> > (*) currently, overlayfs needs to lookup the lower file also for st_blocks. >> > >> > I am not saying that it should be done or that Miklos will agree to make >> > this change in overlayfs, but that seems to be the major difference. >> > getxattr may have some extra cost depending on in-inode xattr format >> > of erofs, but specifically, the metacopy getxattr can be avoided if this >> > is a special overlayfs RO mount that is marked as EVERYTHING IS >> > METACOPY. >> > >> > I don't expect you guys to now try to hack overlayfs and explore >> > this path to completion. >> > My expectation is that this information will be clearly visible to anyone >> > reviewing future submission, e.g.: >> > >> > - This is the comparison we ran... >> > - This is the reason that composefs gives better results... >> > - It MAY be possible to optimize erofs/overlayfs to get to similar results, >> > but we did not try to do that >> > >> > It is especially important IMO to get the ACK of both Gao and Miklos >> > on your analysis, because remember than when this thread started, >> > you did not know about the metacopy option and your main argument >> > was saving the time it takes to create the overlayfs layer files in the >> > filesystem, because you were missing some technical background on overlayfs. >> >> we knew about metacopy, which we already use in our tools to create >> mapped image copies when idmapped mounts are not available, and also >> knew about the other new features in overlayfs. For example, the >> "volatile" feature which was mentioned in your >> Overlayfs-containers-lpc-2020 talk, was only submitted upstream after >> begging Miklos and Vivek for months. I had a PoC that I used and tested >> locally and asked for their help to get it integrated at the file >> system layer, using seccomp for the same purpose would have been more >> complex and prone to errors when dealing with external bind mounts >> containing persistent data. >> >> The only missing bit, at least from my side, was to consider an image >> that contains only overlay metadata as something we could distribute. >> > > I'm glad that I was able to point this out to you, because now the comparison > between the overlayfs and composefs options is more fair. > >> I previously mentioned my wish of using it from a user namespace, the >> goal seems more challenging with EROFS or any other block devices. I >> don't know about the difficulty of getting overlay metacopy working in a >> user namespace, even though it would be helpful for other use cases as >> well. >> > > There is no restriction of metacopy in user namespace. > overlayfs needs to be mounted with -o userxattr and the overlay > xattrs needs to use user.overlay. prefix. if I specify both userxattr and metacopy=on then the mount ends up in the following check: if (config->userxattr) { [...] if (config->metacopy && metacopy_opt) { pr_err("conflicting options: userxattr,metacopy=on\n"); return -EINVAL; } } to me it looks like it was done on purpose to prevent metacopy from a user namespace, but I don't know the reason for sure. > w.r.t. the implied claim that composefs on-disk format is simple enough > so it could be made robust enough to avoid exploits, I will remain > silent and let others speak up, but I advise you to take cover, > because this is an explosive topic ;) > > Thanks, > Amir.