Re: [PATCH v5 1/2] tmpfs: Add per-superblock i_ino support

Hugh Dickins <hughd@xxxxxxxxxx> · Wed, 8 Jan 2020 02:58:52 -0800 (PST)

On Tue, 7 Jan 2020, Amir Goldstein wrote:
> 
> I vote in favor or best of both patches to result in a simpler outcome:
> 1. use inop approach from Hugh's patch
> 2. use get_next_ino() instead of per-sb ino_batch for kern_mount
> 
> Hugh, do you have any evidence or suspect that shmem_file_setup()
> could be contributing to depleting the global get_next_ino pool?

Depends on what the system is used for: on one system, it will make
very little use of that pool, on another it will be one of the major
depleters of the pool.  Generally, it would be kinder to the other
users of the pool (those that might also care about ino duplication)
if shmem were to cede all use of it; but a bigger patch, yes.

> Do you have an objection to the combination above?

Objection would be too strong: I'm uncomfortable with it, and not
tempted to replace our internal implementation by one reverting to
use get_next_ino(); but not as uncomfortable as I am with holding
up progress on the issue.

Uncomfortable because of the "depletion" you mention.  Uncomfortable
because half the reason for ever offering the unlimited "nr_inodes=0"
mount option was to avoid stat_lock overhead (though, for all I know,
maybe nobody anywhere uses that option now - and I'll be surprised if
0-day uses it and reports any slowdown).

Also uncomfortable because your (excellent, and I'd never thought
about it that way) argument that the shm_mnt objects are internal
and unstatable (that may not resemble how you expressed it, I've not
gone back to search the mails to find where you made the point), is
not entirely correct nowadays - memfd_create()d objects come from
that same shm_mnt, and they can be fstat()ed.  What is the
likelihood that anything would care about duplicated inos of
memfd_create()d objects?  Fairly low, I think we'll agree; and
we can probably also say that their inos are an odd implementation
detail that no memfd_create() user should depend upon anyway.  But
I mention it in case it changes your own feeling about the shm_mnt.

> > Not-Yet-Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> >
> > 1) Probably needs minor corrections for the 32-bit kernel: I wasn't
> >    worrying about that at the time, and expect some "unsigned long"s
> >    I didn't need to change for the 64-bit kernel actually need to be
> >    "u64"s or "ino_t"s now.
> > 2) The "return 1" from shmem_encode_fh() would nowadays be written as
> >    "return FILEID_INO32_GEN" (and overlayfs takes particular interest
> >    in that fileid); yet it already put the high 32 bits of the ino into
> >    the fh: probably needs a different fileid once high 32 bits are set.
> 
> Nice spotting, but this really isn't a problem for overlayfs.
> I agree that it would be nice to follow the same practice as xfs with
> XFS_FILEID_TYPE_64FLAG, but as one can see from the private
> flag, this is by no means a standard of any sort.
> 
> As the name_to_handle_at() man page says:
> "Other than the use of the handle_bytes field, the caller should treat the
>  file_handle structure as an opaque data type: the handle_type and f_handle
>  fields are needed only by a  subsequent call to open_by_handle_at()."
> 
> It is true that one of my initial versions was checking value returned from
> encode_fh, but Miklos has pointed out to me that this value is arbitrary
> and we shouldn't rely on it.
> 
> In current overlayfs, the value FILEID_INO32_GEN is only used internally
> to indicate the case where filesystem has no encode_fh op (see comment
> above ovl_can_decode_fh()).  IOW, only the special case of encoding
> by the default export_encode_fh() is considered a proof for 32bit inodes.
> So tmpfs has never been considered as a 32bit inodes filesystem by
> overlayfs.

Thanks, you know far more about encode_fh() than I ever shall, so for
now I'll take your word for it that we don't need to make any change
there for this 64-bit ino support.  One day I should look back, go
through the encode_fh() callsites, and decide what that "return 1"
really ought to say.

It's inconvenient, I'm sorry, but I shall have to go quiet again
for a few days - I'm here, but cannot afford to split my attention.

Hugh