One of the projects I'm playing with for containers is lazy-loading of layers. We've found that less than 10% of the files on a layer actually get used, which is an unfortunate waste. It also means in some cases downloading ~100s of MB, or ~1s of GB of files before starting a container workload. This is unfortunate. It would be nice if there was a way to start a container workload, and have it so that if it tries to access and unpopulated (not yet downloaded) part of the filesystem block while trying to be accessed. This is trivial to do if the "lowest" layer is FUSE, where one can just stall in userspace on loads. Unfortunately, AFAIK, there's not a good way to swap out the FUSE filesystem with the "real" filesystem once it's done fully populating, and you have to pay for the full FUSE cost on each read / write. I've tossed around: 1. Mutable lowerdirs and having something like this: layer0 --> Writeable space layer1 --> Real XFS filesystem layer2 --> FUSE FS and if there is a "miss" on layer 1, it will then look it up on layer 2 while layer 1 is being populated. Then the FUSE FS can block. This is neat, but it requires the FUSE FS to always be up, and incurs a userspace bounce on every miss. It also means things like metadata only copies don't work. Does anyone have a suggestion of a mechanism to handle this? I've looked into swapping out layers on the fly, and what it would take to add a mechanism like userfaultfd to overlayfs, but I was wondering if anything like this was already built, or if someone has thought it through more than me.