Re: Using the upcoming fsinfo()

Ian Kent <raven@xxxxxxxxxx> · Thu, 16 May 2019 08:13:32 +0800

On Wed, 2019-05-15 at 13:45 +0200, Karel Zak wrote:
> On Tue, May 14, 2019 at 08:23:02AM +0800, Ian Kent wrote:
> > On Mon, 2019-05-13 at 11:08 +0200, Karel Zak wrote:
> > > The nice place where is ugly overhead with the current mountinfo is
> > > context_umount.c code, see lookup_umount_fs() and
> > > mnt_context_find_umount_fs(). In this code we have mountpoint and we
> > > need more information about it (due to redirection to umount.<type>
> > > helpers, userspace mount options, etc.). It sounds like ideal to use
> > > mnt_fsinfo_fill_fs() if possible.
> > 
> > That sounds like an ideal opportunity for improvement by using
> > fsinfo(). I'll look there too.
> 
> Yes.
> 
> > > The most visible change will be to use mnt_fsinfo_fill_table() with in
> > > mnt_table_parse_file() if the file name is "/proc/self/mountinfo".
> > > This will be huge improvement as we use this function in systemd on
> > > each mount table change...
> > > 
> > > The question is how easily will be to replace mountinfo with fsinfo().
> 
> Now when I think about it I'm not sure if create complete mount table
> by fsinfo() is the right way. Maybe many fsinfo() calls will be more
> slow than generate mountinfo file in kernel and read() in userspace.
> Not sure.

I'm not sure about the comparison in overhead of this either.

But it's something that needs to be done to get familiar with how
to use fsinfo() and to work out what else needs to be done.

As you know this has already shown that getting file system specific
options isn't working yet for most file systems and I'll need to
implement the missing ->fsinfo() super block operation for (at least
some, probably many) file systems just to continue the work.

There is a slightly less obvious difference using fsinfo() over
the proc file system to get the whole mount table.

When you open a proc file system mount table the kernel takes locks
that will prevent (at least) mount, umount and remount actions until
the proc file is closed.

With fsinfo() the locks need to be taken but at a much finer
granularity so mount actions can continue in parallel.

There's a price for that locking improvement though, if your trying
to get a whole consistent mount table you need to check it hasn't
changed while you read it and if it has you need to start over.

So getting the whole table with fsinfo() will definitely need to be
evaluated against using the proc file system.

But there's quite a lot of processing that happens when the kernel
issues proc mount records, the path calculations are quite expensive
for example, so the difference isn't clear cut.

> 
> > I've been looking at libmount but I'm not sure I was focusing on
> > libmnt_fs so I'm not sure yet.
> > 
> > A large part of doing this early is to find out what's missing
> > and see if it's possible to update fsinfo().
> 
> Yes, it would be really nice if we can get all info from fsinfo(). It
> opens a new possibilities for us to make things like umount, remount,
> and systemd more effective.

I think we will be able to but probably not for a while, there's
quite a bit still to do for fsinfo() by the look of it.

Excessive resource usage of systemd is one of the main motivations
for me doing this so improving that is at the top of the priority
list for me.

> 
> > For example, the devanme in mountinfo which can be different to
> > the devname returned by fsinfo(), David has said it's not straight
> > forward to change but at least he's aware of it and thinking about
> > it.
> 
> Do you mean "source" field (9th column in mountinfo)? The device is
> defined by maj:min (3rd column) in the file (well, whatever the devno
> means for things like btrfs;-).

I do.

> 
> The "source" should be unmodified string as specified in userspace for
> mount(2) syscall, otherwise things like "mount -a" can not compare the
> kernel mount table with fstab.

This string isn't always a string value that comes from the mount
kernel structure, the the proc file system needs to call upon the
file system to get it in some cases.

For example when you see an NFS <server>:<path> in the proc file system
output.

To get this string the proc file system checks if the file system provides
a super block operation ->show_devname() and calls it to get the name
otherwise it copies the string from the mount structure.

As David says, to deal with this it isn't as simple as adding an fsinfo()
request because there are cases where it can have multiple values.

A similar thing is done for field 4 where, if the file system defines
a super block operation ->show_path() it will be called to get the
path, otherwise it's calculated using mount's root. Interestingly NFS
appears to always return "/" for this from its ->show_path() function.

And, as I mentioned above, there's the needed ->fsinfo() super operation
to cover the use of the existing ->show_options() operation (provided
by pretty much all file systems) to get the file system specific options.

So there's quite a bit of detail to be worked out for fsinfo() to be
able to correctly provide all mount information.

But, hey, that was the point of doing this now.

Ian