Re: Re: Re: [5/8] syscall_cred() a system call that receives alternate CREDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, April 08, 2013 14:31:20 J. Bruce Fields wrote:
> On Mon, Apr 08, 2013 at 11:23:14AM -0700, Jim Lieb wrote:
> > On Monday, April 08, 2013 10:42:02 J. Bruce Fields wrote:
> > > On Mon, Apr 08, 2013 at 01:36:46PM +0300, Boaz Harrosh wrote:
> > > > From: Jim Lieb <jlieb@xxxxxxxxxxx>
> > > > 
> > > > In current NFS Server (Ganesha) lots of operation becomes 6 syscalls
> > > > (Or is it 7?)
> > > > 
> > > > - setfsuid(), setfsgid(), thread_setgroups()
> > > > - The OP
> > > > - Revert setfsuid(), setfsgid() to root
> > > > 
> > > > This is because if we do all these file operations as root then
> > > > FS will not account for the quota a user have on create files,
> > > > data space, and so on.
> > > 
> > > To make sure I understand, you're saying that:
> > > 	- the behavior you get out of those 6 syscalls is correct,
> > > 	- you just want to be able to do exactly the same thing, but
> > > 	
> > > 	  with 1 syscall.  (For performance?)
> > > 
> > > Or is there some other issue?
> > 
> > I have attached the email I sent around on the nfs-ganesha list with a
> > model api so we know the details.
> > 
> > Boaz replied "performance" but there are also race conditions to consider.
> >  If we get signals or ??? somewhere in the sequence, what is our state? 
> > Yes, the setfsuid call back to root can still be done but masquerading
> > has any signals etc. be in the context of that user/group and there is
> > one syscall to deal with, not a stream.
> 
> Sorry, I don't understand what you're saying here.  Could you give an
> example showing a sequence of events with the wrong result?

We are setting user, primary group, and alt groups in sequence before we do 
the actual work (read/write/...).  This is a potential TOCTOU race.  Granted, 
there is little/no real atomic guarantee but implied in the syscall model is 
that creds don't change for the duration of a syscall.  We go back to 
userspace multiple times with creds in intermediate state(s).  Signals can 
happen anytime but are only checked on the way back out of the syscall or we 
can hold them off at critical times within a single syscall.  Which syscall is 
is the one where the signal occurred?  In our case, we minimally use signals 
(do no i/o etc.) but they are still there.  If it is one syscall, we know.

We currently have an RFC implementation of a "creds wrapper" but it is still 
in flux and the codiing of all these calls to "get it right" is ugly.  One 
call, done right would be much better.

We also have a problem with the setgroups.  We escape in Linux because the 
kernel doesn't do it process wide and glibc fakes it.  I don't want to depend 
on that.  In FreeBSD, we can't do it at all since the creds are shared at the 
proc level.  Note that I am constrained to think about portability and it's 
easier to sell a new syscall than to hack fundamental kernel structures which 
is why the "do to all" bit is in glibc...

> 
> > There may be selinux/apparmor issues to deal with too.  If we first
> > masquerade the thread and then apply all these access checks, as far
> > as the kernel is concerned, it is the masqueraded user.
> 
> I don't understand here either.

There is the security context nfs-ganesha would live in but actions on behalf 
of clients are (or will be in 4.2+) be in the context of the client.  This is 
outside my expertise but I'd like to have a "masquerading" framework in place 
where it could be added in a known way, or at least we are thinking about it.

Capabilities have also been thrown into the mix.  I will be the first to defer 
to the selinux/apparmor heavies but I'd like to have all that capability 
constricted down to one syscall that can be controlled, i.e. selinux says only 
real samba and real nfs-ganesha can do this call.
> 
> --b.
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux