file forks vs. xattr (was: xattr names for unprivileged stacking?)

Christian Schoenebeck <qemu_oss@xxxxxxxxxxxxx> · Mon, 17 Aug 2020 12:37:17 +0200

On Montag, 17. August 2020 00:56:20 CEST Dave Chinner wrote:
> > That's yet another question: should xattrs and forks share the same data-
> > and namespace, or rather be orthogonal to each other.
> 
> Completely orthogonal. Alternate data streams are not xattrs, and
> xattrs are not ADS....

Agreed. Their key features (atomic small data vs. non-atomic large data) and 
their typical uses cases are probably too different for trying to stitch them 
somehow in an erroneous way into a shared space. Plus it would actually be 
beneficial if forks had their own xattrs.

On Montag, 17. August 2020 02:29:30 CEST Dave Chinner wrote:
> I'd stop calling these "forks" already, too. The user wants
> "alternate data streams", while a "resource fork" is an internal
> filesystem implementation detail used to provide ADS
> functionality...

The common terminology can certainly still be argued. I understand that from 
fs implementation perspective "fork" is probably ambiguous. But from public 
API (i.e. user space side) perspective the term "fork" does make sense, and so 
far I have not seen a better general term for this. Plus the ambiguous aspects 
on fs side are not exposed to the public side.

The term "alternate data stream" suggests that this is just about the raw data 
stream, but that's probably not what this feature will end up being limited 
to. E.g. I think they will have their own permissions on the long term (see 
below). Plus the term ADS is ATM somewhat sticky to the Microsoft universe.

> IOWs, with a filesystem inode fork implementation like this for ADS,
> all we really need is for the VFS to pass a magic command to
> ->lookup() to tell us to use the ADS namespace attached to the inode
> rather than use the primary inode type/state to perform the
> operation.

IMO starting with a minimalistic approach, in a way Solaris developers 
originally introduced forks, would IMO make sense for Linux as well:

- Adding a new option O_FORK to fcntl.h (Solaris uses O_XATTR, not a good
  idea for Linux though for reasons discussed).

- (Mis)using existing APIs for accessing forks (i.e. *at() functions):

	/* open fork 'foo' of file 'sheet.pdf' */

	int fdfile = open("sheet.pdf", O_PATH);
	int fdfork = openat(fdfile, "foo", O_FORK);
	/* continue with regular file I/O on fdfork now ... */

	and

	/* list all forks of file 'sheet.pdf' */

	int fdfile = open("sheet.pdf", O_PATH);
	int fdlist = openat(fdfile, ".", O_RDONLY|O_FORK);
	DIR* dir = fdopendir(fdlist);
	struct dirent* dent;
	while ((dent = readdir(dir)) {
		...
	}

- Permissions and ownership: Same as the file for simplicity as starting 
  point for the first version (see below).

- No subforks as starting point, and hence path separator '/' inside fork 
  names would be prohibited initially to avoid future clashes.

> Hence all the ADS support infrastructure is essentially dentry cache
> infrastructure allowing a dentry to be both a file and directory,
> and providing the pathname resolution that recognises an ADS
> redirection. Name that however you want - we've got to do an on-disk
> format change to support ADS, so we can tell the VFS we support ADS
> or not. And we have no cares about existing names in the filesystem
> conflicting with the ADS pathname identifier because it's a mkfs
> time decision. Given that special flags are needed for the openat()
> call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> ADS identifier as an ADS the moment it is seen...

So you think there should be a built-in full qualified path name resolution to 
forks right from the start? E.g. like on Windows "C:\some\where\sheet.pdf:foo" 
-> fork "foo" of file "sheet.pdf"?

> > I don't understand why a fork would be permitted to have its own
> > permissions.  That makes no sense.  Silly Solaris.
> 
> I can't think of a reason why, either, but the above implementation
> for XFS would support it if the presentation layer allows it... :)

I would definitely not add this right from the start of course, but on the 
long term it actually does make senses for them having their own permissions, 
simply because there are already applications for that:

E.g. on some systems forks are used to tag files for security relevant issues, 
for instance where the file originated from (a trusted vs. untrusted source). 
If it was a untrusted source, the user is made aware about this circumstance 
by the system when attempting to open the file. In this use case the fork 
would probably have more restrictive permissions than the actual file.

OTOH forks are used to extend existing files in non-obtrusive way. Say you 
have some sort of (e.g. huge) master file, and a team works on that file. Then 
the individual people would attach their changes solely as forks to the master 
file with their ownership, probably even with complex ACLs, to prevent certain 
users from touching (or even reading) other ones changes. In this use case the 
master file might be readonly for most people, while the individual forks 
being anywhere between more permissive or more restrictive.

Best regards,
Christian Schoenebeck