On 8/12/2020 2:18 PM, Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx) wrote: > What's wrong with fstatfs()? All the extra magic metadata seems to not > really be anything people really care about. > > What people are actually asking for seems to be some unique mount ID, > and we have 16 bytes of spare information in 'struct statfs64'. > > All the other fancy fsinfo stuff seems to be "just because", and like > complete overdesign. Hi Linus, Is there any existing method by which userland applications can determine the properties of the filesystem in which a directory or file is stored in a filesystem agnostic manner? Over the past year I've observed the opendev/openstack community struggle with performance issues caused by rsync's inability to determine if the source and destination object's last update time have the same resolution and valid time range. If the source file system supports 100 nanosecond granularity and the destination file system supports one second granularity, any source file with a non-zero fractional seconds timestamp will appear to have changed compared to the copy in the destination filesystem which discarded the fractional seconds during the last sync. Sure, the end user could use the --modify-window=1 option to inform rsync to add fuzz to the comparisons, but that introduces the possibility that a file updated a fraction of a second after an rsync execution would not synchronize the file on the next run when both source and target have fine grained timestamps. If the userland sync processes have access to the source and destination filesystem time capabilities, they can make more intelligent decisions without explicit user input. At a minimum, the timestamp properties that are important to know include the range of valid timestamps and the resolution. Some filesystems support unsigned 32-bit time starting with UNIX epoch. Others signed 32-bit time with UNIX epoch. Still others FAT, NTFS, etc use alternative epochs and range and resolutions. Another case where lack of filesystem properties is problematic is "df --local" which currently relies upon string comparisons of file system name strings to determine if the underlying file system is local or remote. This requires that the gnulib maintainers have knowledge of all file systems implementations, their published names, and which category they belong to. Patches have been accepted in the past year to add "smb3", "afs", and "gpfs" to the list of remote file systems. There are many more remote filesystems that have yet to be added including "cephfs", "lustre", "gluster", etc. In many cases, the filesystem properties cannot be inferred from the filesystem name. For network file systems, these properties might depend upon the remote server capabilities or even the properties associated with a particular volume or share. Consider the case of a remote file server that supports 64-bit 100ns time but which for backward compatibility exports certain volumes or shares with more restrictive capabilities. Or the case of a network file system protocol that has evolved over time and gained new capabilities. For the AFS community, fsinfo offers a method of exposing some server and volume properties that are obtained via "path ioctls" in OpenAFS and AuriStorFS. Some example of properties that might be exposed include answers to questions such as: * what is the volume cell id? perhaps a uuid. * what is the volume id in the cell? unsigned 64-bit integer * where is a mounted volume hosted? which fileservers, named by uuid * what is the block size? 1K, 4K, ... * how many blocks are in use or available? * what is the quota (thin provisioning), if any? * what is the reserved space (fat provisioning), if any? * how many vnodes are present? * what is the vnode count limit, if any? * when was the volume created and last updated? * what is the file size limit? * are byte range locks supported? * are mandatory locks supported? * how many entries can be created within a directory? * are cross-directory hard links supported? * are directories just-send-8, case-sensitive, case-preserving, or case-insensitive? * if not just-send-8, what character set is used? * if Unicode, what normalization rules? etc. * are per-object acls supported? * what volume maximum acl is assigned, if any? * what volume security policy (authn, integ, priv) is assigned, if any? * what is the replication policy, if any? * what is the volume encryption policy, if any? * what is the volume compression policy, if any? * are server-to-server copies supported? * which of atime, ctime and mtime does the volume support? * what is the permitted timestamp range and resolution? * are xattrs supported? * what is the xattr maximum name length? * what is the xattr maximum object size? * is the volume currently reachable? * is the volume immutable? * etc ... Its true that there isn't widespread use of these filesystem properties by today's userland applications but that might be due to the lack of standard interfaces necessary to acquire the information. For example, userland frameworks for parallel i/o HPC applications such as HDF5, PnetCDF and ROMIO require each supported filesystem to provide its own proprietary "driver" which does little more than expose the filesystem properties necessary to optimize the layout of file stream data structures. With something like "fsinfo" it would be much easier to develop these HPC frameworks in a filesystem agnostic manner. This would permit applications built upon these frameworks to use the best Linux filesystem available for the workload and not simply the ones for which proprietary "drivers" have been published. Although I am sympathetic to the voices in the community that would prefer to start over with a different architectural approach, David's fsinfo has been under development for more than two years. It has not been developed in a vacuum but in parallel with other kernel components that have been merged during that time frame. From my reading of this thread and those that preceded it, fsinfo has also been developed with input from significant userland development communities that intend to leverage the syscall interface as soon as it becomes available. The March 2020 discussion of fsinfo received positive feedback not only from within Red Hat but from other parties as well. Since no one stepped up to provide an alternative approach in the last five months, how long should those that desire access to the functionality be expected to wait for it? What is the likelihood that an alternative robust solution will be available in the next merge window or two? Is the design so horrid that it is better to go without the functionality than to live with the imperfections? I for one would like to see this functionality be made available sooner rather than later. I know my end users would benefit from the availability of fsinfo. Thank you for listening. Stay healthy and safe, and please wear a mask. Jeffrey Altman
begin:vcard fn:Jeffrey Altman n:Altman;Jeffrey org:AuriStor, Inc. adr:;;255 W 94TH ST STE 6B;New York;NY;10025-6985;United States email;internet:jaltman@xxxxxxxxxxxx title:CEO tel;work:+1-212-769-9018 url:https://www.linkedin.com/in/jeffreyaltman/ version:2.1 end:vcard
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature