Re: [PATCH 32/32] [RFC] fsinfo: Add a system call to allow querying of filesystem information [ver #8]

Arnd Bergmann <arnd@xxxxxxxx> · Mon, 4 Jun 2018 18:00:27 +0200

On Mon, Jun 4, 2018 at 5:01 PM, David Howells <dhowells@xxxxxxxxxx> wrote:
> Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
>> ntfs has sb->s_time_gran=100, and vfat should really have
>> sb->s_time_gran=2000000000 but that doesn't seem to be set right
>> at the moment.
>
> (V)FAT actually has a different granularity on each timestamp.

Ah, right. I guess I missed the fact that the file system can override it,
so FAT doesn't actually have to do it this way.

>> > +struct fsinfo_capabilities {
>> > +       __u64   supported_stx_attributes;       /* What statx::stx_attributes are supported */
>> > +       __u32   supported_stx_mask;             /* What statx::stx_mask bits are supported */
>> > +       __u32   supported_ioc_flags;            /* What FS_IOC_* flags are supported */
>> > +       __u8    capabilities[(fsinfo_cap__nr + 7) / 8];
>> > +};
>>
>> This looks a bit odd: with the 44 capabilities, you end up having a six-byte
>> array followed by two bytes of implicit padding. If the number of
>> capabilities grows beyond 64, you have a nine byte array with more padding
>> to the next alignof(__u64). Is that intentional?
>
> I've split the capabilities out into their own thing.  I've attached the
> revised patch below.

I'm still not completely clear on how variable-length structures are supposed to
be handled by the fsinfo syscall. It seems like a possible source of
bugs to return
a structure from the kernel that has a different size in kernel and user space
depending on the fsinfo_cap__nr value at compile time. How does one
e.g. guarantee there is no out of bounds access when you run new user space on
an older kernel that has a smaller structure?

>> > +struct fsinfo_timestamp_info {
>> > +       __s64   minimum_timestamp;      /* Minimum timestamp value in seconds */
>> > +       __s64   maximum_timestamp;      /* Maximum timestamp value in seconds */
>> > +       __u16   atime_gran_mantissa;    /* Granularity(secs) = mant * 10^exp */
>> > +       __u16   btime_gran_mantissa;
>> > +       __u16   ctime_gran_mantissa;
>> > +       __u16   mtime_gran_mantissa;
>> > +       __s8    atime_gran_exponent;
>> > +       __s8    btime_gran_exponent;
>> > +       __s8    ctime_gran_exponent;
>> > +       __s8    mtime_gran_exponent;
>> > +};
>>
>> This structure has a slightly inconsistent amount of padding at the end:
>> on x86-32 it has no padding, everywhere else it has 32 bits of padding
>> to make it 64-bit aligned. Maybe add a __u32 reserved field?
>
> It occurs to me that the min and max may be different for each timestamp.
> Maybe I should have:
>
>         struct fsinfo_timestamp_info {
>                 char    name[7 + 1];
>                 __s64   minimum_timestamp;
>                 __s64   maximum_timestamp;
>                 __u16   granularity_mantissa;
>                 __s8    granularity_exponent;
>                 __u8    __reserved[5];
>         };
>

I don't particularly like having a string in there, that seems to add
unnecessary complexity compared to using an integer. Having four
min/max values would make it more generic but I don't think we have
a need for that at the moment with any of the file systems we support.

In any case, it would be nice to have a trivial way to query which of
the four timestamp types are supported at all, and returning
them separately would be one way of doing that.

> and then you iterate through them by setting Nth.  I could just put a:
>
>                 __u64   granularity;
>
> field, expressed in nS, rather than mantissa and exponent, but doing it this
> way allows me to express granularities less that 1nS very simply (something
> Dave Chinner was talking about).
>
> It might also be worth putting minimum_timestamp and maximum_timestamp in
> terms of granularity rather than nS.

Expressing the granularity in nanoseconds (or something smaller) would seem
more natural to me, but I don't really care much either way.

          Arnd