On Tue, Mar 20, 2018 at 9:04 AM, Ian Kent <raven@xxxxxxxxxx> wrote: > Hi Amir, Miklos, > > On 20/03/18 14:29, Amir Goldstein wrote: >> >> And I do appreciate the time you've put into understanding the overlayfs >> problem and explaining the problems with my current proposal. >> > > For a while now I've been wondering why overlayfs is keen to avoid using > a local, persistent, inode number mapping cache? Think of overlayfs as a normal filesystem, except it's not backed by a block device, but instead one or more read-only directory tree and optionally one writable directory tree. There's a twist, however: when not mounted, you are allowed to change the backing directories. This is a really important feature of overlayfs. So where does the initial mapping come from (overlay is never started from scratch, like a newly formatted filesystem)? And what happens when layers are modified and we encounter unmapped inode numbers? In both cases we must either create/update the mapping before mount, or update the mapping on lookup. Creating/updating the mapping up-front means a really high startup cost, which can be amortized if the layers are guaranteed not to change outside of the overlay. Updating a persistent mapping on lookup means having to do sync writes on lookup, which can be very detrimental to performance. If all layers are read-only, this scheme falls apart, since we've nowhere to write the persistent mapping. Or we can just say, screw the persistency and store the mapping on e.g. tmpfs. Performance-wise that's much better, but then we fail to provide the guarantees about inode numbers (e.g. NFS export won't work properly). In my opinion it's much less about simplicity of implementation as about quality of implementation. Ideas for fixing the above issues are welcome. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html