On Monday, April 08, 2013 14:31:20 J. Bruce Fields wrote: > On Mon, Apr 08, 2013 at 11:23:14AM -0700, Jim Lieb wrote: > > On Monday, April 08, 2013 10:42:02 J. Bruce Fields wrote: > > > On Mon, Apr 08, 2013 at 01:36:46PM +0300, Boaz Harrosh wrote: > > > > From: Jim Lieb <jlieb@xxxxxxxxxxx> > > > > > > > > In current NFS Server (Ganesha) lots of operation becomes 6 syscalls > > > > (Or is it 7?) > > > > > > > > - setfsuid(), setfsgid(), thread_setgroups() > > > > - The OP > > > > - Revert setfsuid(), setfsgid() to root > > > > > > > > This is because if we do all these file operations as root then > > > > FS will not account for the quota a user have on create files, > > > > data space, and so on. > > > > > > To make sure I understand, you're saying that: > > > - the behavior you get out of those 6 syscalls is correct, > > > - you just want to be able to do exactly the same thing, but > > > > > > with 1 syscall. (For performance?) > > > > > > Or is there some other issue? > > > > I have attached the email I sent around on the nfs-ganesha list with a > > model api so we know the details. > > > > Boaz replied "performance" but there are also race conditions to consider. > > If we get signals or ??? somewhere in the sequence, what is our state? > > Yes, the setfsuid call back to root can still be done but masquerading > > has any signals etc. be in the context of that user/group and there is > > one syscall to deal with, not a stream. > > Sorry, I don't understand what you're saying here. Could you give an > example showing a sequence of events with the wrong result? We are setting user, primary group, and alt groups in sequence before we do the actual work (read/write/...). This is a potential TOCTOU race. Granted, there is little/no real atomic guarantee but implied in the syscall model is that creds don't change for the duration of a syscall. We go back to userspace multiple times with creds in intermediate state(s). Signals can happen anytime but are only checked on the way back out of the syscall or we can hold them off at critical times within a single syscall. Which syscall is is the one where the signal occurred? In our case, we minimally use signals (do no i/o etc.) but they are still there. If it is one syscall, we know. We currently have an RFC implementation of a "creds wrapper" but it is still in flux and the codiing of all these calls to "get it right" is ugly. One call, done right would be much better. We also have a problem with the setgroups. We escape in Linux because the kernel doesn't do it process wide and glibc fakes it. I don't want to depend on that. In FreeBSD, we can't do it at all since the creds are shared at the proc level. Note that I am constrained to think about portability and it's easier to sell a new syscall than to hack fundamental kernel structures which is why the "do to all" bit is in glibc... > > > There may be selinux/apparmor issues to deal with too. If we first > > masquerade the thread and then apply all these access checks, as far > > as the kernel is concerned, it is the masqueraded user. > > I don't understand here either. There is the security context nfs-ganesha would live in but actions on behalf of clients are (or will be in 4.2+) be in the context of the client. This is outside my expertise but I'd like to have a "masquerading" framework in place where it could be added in a known way, or at least we are thinking about it. Capabilities have also been thrown into the mix. I will be the first to defer to the selinux/apparmor heavies but I'd like to have all that capability constricted down to one syscall that can be controlled, i.e. selinux says only real samba and real nfs-ganesha can do this call. > > --b. -- Jim Lieb Linux Systems Engineer Panasas Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html