Re: [PATCH 3/3] shmem: Add support for using full width of ino_t

Amir Goldstein <amir73il@xxxxxxxxx> · Sat, 28 Dec 2019 06:20:58 +0200

On Fri, Dec 27, 2019 at 6:35 PM Chris Down <chris@xxxxxxxxxxxxxx> wrote:
>
> Amir Goldstein writes:
> >On Fri, Dec 27, 2019 at 4:30 PM Chris Down <chris@xxxxxxxxxxxxxx> wrote:
> >>
> >> The new inode64 option now uses get_next_ino_full, which always uses the
> >> full width of ino_t (as opposed to get_next_ino, which always uses
> >> unsigned int).
> >>
> >> Using inode64 makes inode number wraparound significantly less likely,
> >> at the cost of making some features that rely on the underlying
> >> filesystem not setting any of the highest 32 bits (eg. overlayfs' xino)
> >> not usable.
> >
> >That's not an accurate statement. overlayfs xino just needs some high
> >bits available. Therefore I never had any objection to having tmpfs use
> >64bit ino values (from overlayfs perspective). My only objection is to
> >use the same pool "irresponsibly" instead of per-sb pool for the heavy
> >users.
>
> Per-sb get_next_ino is fine, but seems less important if inode64 is used. Or is
> your point about people who would still be using inode32?
>
> I think things have become quite unclear in previous discussions, so I want to
> make sure we're all on the same page here. Are you saying you would
> theoretically ack the following series?
>
> 1. Recycle volatile slabs in tmpfs/hugetlbfs
> 2. Make get_next_ino per-sb
> 3. Make get_next_ino_full (which is also per-sb)
> 4. Add inode{32,64} to tmpfs

Not what I meant. On the contrary:
1. Recycle ino from slab is a nice idea, but it is not applicable
    along with per-sb ino allocator, so you can't use it for tmpfs
2. Leave get_next_ino() alone - it is used by things like pipe(2)
    that you don't want to mess with
3. Don't create another global ino allocator
4. inode{32,64} option to tmpfs is the only thing you need

We've made quite a big mess of a problem that is not really that big.

In this thread on zhenbin's patch you have the simple solution that
Google are using to your problem:
https://patchwork.kernel.org/patch/11254001/#23014383

The only thing keeping this solution away from upstream according to
tmpfs maintainer is the concern of breaking legacy 32bit apps.

If you make the high ino bits exposed opt-in by mount and/or Kconfig
option, then this concern would be mitigated and Google's private
solution to tmpfs ino could go upstream.

Hugh did not specify if sbinfo->next_ino is incremented under
sbinfo->stat_lock or some other lock (maybe he can share a link to
the actual patch?), but shmem_reserve_inode() already takes that
lock anyway, so I don't see the need to any further micro optimizations.

Chris, I hope the solution I am proposing is clear now and I hope I am
not leading you by mistake into another trap...

To be clear, solution should be dead simple and contained to tmpfs.
If you like, you could clone exact same solution to hugetlbfs, but no
new vfs helpers please.

Thanks,
Amir.