Re: Question about XFS_MAXINUMBER

Miklos Szeredi <miklos@xxxxxxxxxx> · Tue, 20 Mar 2018 10:20:47 +0100

On Tue, Mar 20, 2018 at 9:04 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
> Hi Amir, Miklos,
>
> On 20/03/18 14:29, Amir Goldstein wrote:
>>
>> And I do appreciate the time you've put into understanding the overlayfs
>> problem and explaining the problems with my current proposal.
>>
>
> For a while now I've been wondering why overlayfs is keen to avoid using
> a local, persistent, inode number mapping cache?

Think of overlayfs as a normal filesystem, except it's not backed by a
block device, but instead one or more read-only directory tree and
optionally one writable directory tree. There's a twist, however: when
not mounted, you are allowed to change the backing directories.  This
is a really important feature of overlayfs.

So where does the initial mapping come from (overlay is never started
from scratch, like a newly formatted filesystem)?  And what happens
when layers are modified and we encounter unmapped inode numbers?

In both cases we must either create/update the mapping before mount,
or update the mapping on lookup.

Creating/updating the mapping up-front means a really high startup
cost, which can be amortized if the layers are guaranteed not to
change outside of the overlay.

Updating a persistent mapping on lookup means having to do sync writes
on lookup, which can be very detrimental to performance.  If all
layers are read-only, this scheme falls apart, since we've nowhere to
write the persistent mapping.

Or we can just say, screw the persistency and store the mapping on
e.g. tmpfs.  Performance-wise that's much better, but then we fail to
provide the guarantees about inode numbers (e.g. NFS export won't work
properly).

In my opinion it's much less about simplicity of implementation as
about quality of implementation.

Ideas for fixing the above issues are welcome.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html