Re: [RFC PATCH] getvalues(2) prototype

"Theodore Ts'o" <tytso@xxxxxxx> · Sat, 26 Mar 2022 00:19:10 -0400

On Fri, Mar 25, 2022 at 10:25:53AM +0100, Karel Zak wrote:
> 
> Right, the speed of ps(1) or lsof(1) is not important. IMHO the current
> discussion about getvalues() goes in wrong direction :-)
> 
> I guess the primary motivation is not to replace open+read+close, but
> provide to userspace something usable to get information from mount
> table, because the current /proc/#/mountinfo and notification by
> poll() is horrible.

I think that's because the getvalues(2) prototype *only* optimizes
away open+read+close, and doesn't do a *thing* with respect to
/proc/<pid>/mountinfo.

> Don't forget that the previous attempt was fsinfo() from David Howells
> (unfortunately, it was too complex and rejected by Linus).

fsinfo() tried to do a lot more than solving the /proc/<pid>/mountinfo
problem; perhaps that was the cause of the complexity.

Ignoring the notification problem (which I suspect we could solve with
an extension of fsnotify), if the goal is to find a cleaner way to
fetch information about a process's mount namespace and the mounts in
that namespace, why not trying to export that information via sysfs?
Information about devices are just as complex, after all.

We could make mount namespaces to be their own first class object, so
there would be an entry in /proc/<pid> which returns the mount
namespace id used by a particular process.  Similarly, let each
mounted file system be its own first class object.  Information about
each mount namespace would be in /sys/mnt_ns, and information about
each mounted file system would be in /sys/superblock.  Then in
/sys/mnt_ns there would be a directory for each (superblock,
mountpoint) pair.

Given how quickly programs like lsof can open tens of thousands of
small files, and typically there are't that many mounted file systems
in a particular mount namespace, performance really shouldn't be a
problem.

If it works well enough for other kernel objects that are accessed via
sysfs, and fsinfo() is way to complex, why don't we try a pattern
which has worked and is "native" to Linux?

					- Ted