David Howells <dhowells@xxxxxxxxxx> writes: > Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > >> First let me thank you for adding both FSCONFIG_CMD_CREATE and >> FSCONFIG_CMD_RECONFIGURE. Unfortunately the implementation is currently >> broken. So this patch gets my: >> >> This is broken in two specific ways. >> ... >> 2) FSCONFIG_CMD_CREATE will succeed even if the superblock already >> exists and it can not use all of the superblock parameters. >> >> This happens because vfs_get_super will only call fill_super >> if the super block is created. Which is reasonable on the face >> of it. But it in practice this introduces security problems. >> >> a) Either through reconfiguring a shared super block you did not >> realize was shared (as we saw with devpts). >> >> b) Mounting a super block and not honoring it's mount options >> because something has already mounted it. As we see today >> with proc. Leaving userspace to think the filesystem will behave >> one way when in fact it behaves another. >> >> I have already explained this several times, and apparently I have been >> ignored. This fundamental usability issue that leads to security >> problems. > > I've also explained why you're wrong or at least only partially right. I *do* > *not* want to implement sget() in userspace with the ability for userspace to > lock out other mount requests - which is what it appears that you've been > asking for. All I really care about is that when you ask for a set of paramaters that you get a filesystem with that set of parameters. Not the same filsystem mounted a different way. That has gone wrong twice badly. There is no common case I know of that requires returning the same filesystem twice. AKA the pain of the existing semantics seems much much worse than any benefit. So I am asking that we not propagate the existing semantics into the new API. You are cleaning up dealing with mount options and this is one of the places where they need cleaning up. > However, as I have said, I *am* willing to add one of more flags to help with > this, but I can't make any "legacy" fs honour them as this requires the > fs_context to be passed down to sget_fc() and the filesystem - which is why I > was considering leaving it for later. > > (1) An FSOPEN_EXCL flag to tell sget_fc() to fail if the superblock already > exists at all. > > (2) An FSOPEN_FAIL_ON_MISMATCH flag to explicitly say that we *don't* want a > superblock with different parameters. > > The implication of providing neither flag is that we are happy to accept a > superblock from the same source but with different parameters. > > But it doesn't seem to be an absolute imperative to roll this out immediately, > since what I have exactly mirrors what the kernel currently does - and forcing > a change in that behaviour risks breaking userspace. If it keeps you happy, > however, I can try and work one up. What I am asking is that the default behavior for the new API when using FSCONFIG_CMD_CREATE is to call sget_fc with either FSOPEN_EXCL or FSOPEN_FAIL_ON_MISMATCH. I know FSOPEN_EXCL is trivial to implement. I don't know if there are any hidden gotcha's with FSOPEN_FAIL_ON_MISMATCH. This change in default behavior for your patch needs to be implemented before this hits a released kernel. Returning a filesystem with different than the requested parameters has resulted in at least two major issues, that are very hard to clean up after the fact. A chroot system changing the parameters on /dev/pts resulting in some distributions keeping the suid pt_chown binary long past it's best buy date, and other distributions instead choosing to break userspace. Then there is the current issue where in practice proc does not any of it's mount paramaters which breaks the android security model. The fact that these things happen silently and you have to be on your toes to catch them is fundamentally a bug in the API. If the mount request had simply failed people would have noticed the issues much sooner and silently b0rkend configuration would not have propagated. As such I do not believe we should propagate this misfeature from the old API into the new API. Conceptually I like FSOPEN_FAIL_ON_MISMATH as it looks like it is sufficient to the needs, and with a little luck we could even change the old API to those semantics. Ultimately I want to close a giant mental model mismatch. User: I am creating the data structures to read filesystem X with parameters Y. Kernel: He wants filesystem X. If it is a slow day use parameters Y. Given that historically the reuse of a superblock did not exist, and that in practice it almost never happens. It is quite reqsonable for users to not expect the kernel to completely ignore the mount parameters they pass to the kernel. So please let's fix that now when it is easy. Eric