Re: [PATCH 00/10] exposing knfsd opens to userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 26 2019, Andreas Dilger wrote:

>> On Apr 26, 2019, at 1:20 AM, NeilBrown <neilb@xxxxxxxx> wrote:
>> 
>> On Thu, Apr 25 2019, Andreas Dilger wrote:
>> 
>>> On Apr 25, 2019, at 4:04 PM, J. Bruce Fields <bfields@xxxxxxxxxx> wrote:
>>>> 
>>>> From: "J. Bruce Fields" <bfields@xxxxxxxxxx>
>>>> 
>>>> The following patches expose information about NFSv4 opens held by knfsd
>>>> on behalf of NFSv4 clients.  Those are currently invisible to userspace,
>>>> unlike locks (/proc/locks) and local proccesses' opens (/proc/<pid>/).
>>>> 
>>>> The approach is to add a new directory /proc/fs/nfsd/clients/ with
>>>> subdirectories for each active NFSv4 client.  Each subdirectory has an
>>>> "info" file with some basic information to help identify the client and
>>>> an "opens" directory that lists the opens held by that client.
>>>> 
>>>> I got it working by cobbling together some poorly-understood code I
>>>> found in libfs, rpc_pipefs and elsewhere.  If anyone wants to wade in
>>>> and tell me what I've got wrong, they're more than welcome, but at this
>>>> stage I'm more curious for feedback on the interface.
>>> 
>>> Is this in procfs, sysfs, or a separate NFSD-specific filesystem?
>>> My understanding is that "complex" files are verboten in procfs and sysfs?
>>> We've been going through a lengthy process to move files out of procfs
>>> into sysfs and debugfs as a result (while trying to maintain some kind of
>>> compatibility in the user tools), but if it is possible to use a separate
>>> filesystem to hold all of the stats/parameters I'd much rather do that
>>> than use debugfs (which has become root-access-only in newer kernels).
>> 
>> /proc/fs/nfsd is the (standard) mount point for a separate NFSD-specific
>> filesystem, originally created to replace the nfsd-specific systemcall.
>> So the nfsd developers have a fair degree of latitude as to what can go
>> in there.
>> 
>> But I *don't* think it is a good idea to follow this pattern.  Creating
>> a separate control filesystem for every different module that thinks it
>> has different needs doesn't scale well.  We could end up with dozens of
>> tiny filesystems that all need to be mounted at just the right place.  I
>> don't think that is healthy for Linus.
>> 
>> Nor do I think we should be stuffing stuff into debugfs that isn't
>> really for debugging.  That isn't healthy either.
>> 
>> If sysfs doesn't meet our needs, then we need to raise that in
>> appropriate fora and present a clear case and try to build consensus -
>> because if we see a problem, then it is likely that others do to.
>
> I definitely *do* see the restrictions sysfs as being a problem, and I'd
> guess NFS developers thought the same, since the "one value per file"
> paradigm means that any kind of complex data needs to be split over
> hundreds or thousands of files, which is very inefficient for userspace to
> use.  Consider if /proc/slabinfo had to follow the sysfs paradigm, this would
> (on my system) need about 225 directories (one per slab) and 3589 separate
> files in total (one per value) that would need to be read every second to
> implement "slabtop".  Running strace on "top" shows it taking 0.25s wall time
> to open and read the files for only 350 processes on my system, at 2 files
> per process ("stat" and "statm"), and those have 44 and 7 values, respectively,
> so if it had to follow the sysfs paradigm would make this far worse.
>
> I think it would make a lot more sense to have one file per item of interest,
> and make it e.g. a well-structured YAML format ("name: value", with indentation
> denoting a hierarchy/grouping of related items) so that it can be both human
> and machine readable, easily parsed by scripts using bash or awk, rather than
> having an explicit directory+file hierarchy.  Files like /proc/meminfo and
> /proc/<pid>/status are already YAML-formatted (or almost so), so it isn't ugly
> like XML encoding.

So what are your pain points?  What data do you really want to present in
a structured file?
Look at /proc/self/mountstats on some machine which has an NFS mount.
There would be no problem adding similar information for lustre mounts.

What data do you want to export to user-space, which wouldn't fit there
and doesn't fit the one-value-per-file model.  To make a case, we need
concrete data.

>
>> This is all presumably in the context of Lustre and while lustre is
>> out-of-tree we don't have a lot of leverage.  So I wouldn't consider
>> pursuing anything here until we get back upstream.
>
> Sure, except that is a catch-22.  We can't discuss what is needed until
> the code is in the kernel, but we can't get it into the kernel until the
> files it puts in /proc have been moved into /sys?

Or maybe just removed.  If lustre is usable without some of these files,
then we can land lustre without them, and then start the conversation
about how to export the data that we want exported.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux