Re: Re: [5/8] syscall_cred() a system call that receives alternate CREDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, April 08, 2013 10:42:02 J. Bruce Fields wrote:
> On Mon, Apr 08, 2013 at 01:36:46PM +0300, Boaz Harrosh wrote:
> > From: Jim Lieb <jlieb@xxxxxxxxxxx>
> > 
> > In current NFS Server (Ganesha) lots of operation becomes 6 syscalls
> > (Or is it 7?)
> > 
> > - setfsuid(), setfsgid(), thread_setgroups()
> > - The OP
> > - Revert setfsuid(), setfsgid() to root
> > 
> > This is because if we do all these file operations as root then
> > FS will not account for the quota a user have on create files,
> > data space, and so on.
> 
> To make sure I understand, you're saying that:
> 
> 	- the behavior you get out of those 6 syscalls is correct,
> 	- you just want to be able to do exactly the same thing, but
> 	  with 1 syscall.  (For performance?)
> 
> Or is there some other issue?

I have attached the email I sent around on the nfs-ganesha list with a model 
api so we know the details.

Boaz replied "performance" but there are also race conditions to consider.  If 
we get signals or ??? somewhere in the sequence, what is our state?  Yes, the 
setfsuid call back to root can still be done but masquerading has any signals 
etc. be in the context of that user/group and there is one syscall to deal 
with, not a stream.

There may be selinux/apparmor issues to deal with too.  If we first masquerade 
the thread and then apply all these access checks, as far as the kernel is 
concerned, it is the masqueraded user.

> 
> > (Note that permission checking is done by Ganesha core, because
> > 
> >  We may cache open fd(s) and such not, another topic)
> 
> Is there anything we could do to make it possible for you to depend on
> the kernel's permissions checking instead?
> 
I concur with Frank's assessment here.  There are more instances where nfs-
ganesha is doing a syscall as the server than as the masqueraded user.  In the 
pNFS case, this hardly happens at all.  We looked at having the kernel do it 
but found that we also had to do it and mixing gets seriously messy.  For 
starters, we really do want to share fd's.

> --b.
> 
> > We could maybe with hard work save the last two calls for reverting
> > to root, but this will force us to audit lots of code that we are
> > not prepared to do right now. And will not save us much.
> > 
> > [thread_setgroups()]
> > thread_setgroups() is what we use at Ganesha and what Samaba guys use
> > for a per-thread setgroups() call. In the Linux Kernel the setgroups is
> > actually always per thread. It is only the POSIX (crap) pthread layer
> > at glibc that intercepts the setgroups() call (and others), Iterates on
> > all threads that belong to a process, and calls the native Kernel
> > setgroups
> > on them. So thread_setgroups() is just the raw syscall bypassing glibc's
> > processing. We will eventually push this API to glibc.
> > BTW: this is done exactly the same on FreeBSD, with same exact glibc
> > intervention.
> > 
> > [Proposed]
> > What Jim proposed is a syscall that receives a struct that has
> > the regular syscalls parameters plus the creds structure with fsuid/fsgid
> > and groups array. Kernel will set these in, call the original syscall,
> > and revert. This will be done on only an interested subset of the
> > syscalls that are one - are related to filesystems (setfsXid) and two -
> > are of interest to us Servers.
> > 
> > Jim care to scribble a structure definition?
> > 
> > Thanks
> > Boaz
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.
--- Begin Message ---
In replying to the creds RFC branch, an idea came to me.  What we need is a 
syscall for server syscalls.  At first, I thought of doing something like what 
was done for the *at calls.  That got pretty silly with some calls only 
needing an extra flag and others needing extra args.  All of the glibc and abi 
pain was a mess I'd rather not repeat.

How about this idea:

/**
* @brief Syscall entry point for servers that need to masquerade as others
*
* This is a privileged syscall.
*
* @param syscall_number [IN] syscall number from syscall.h
* @param syscall_args     IN] the arguments for that syscall in a vector 
mimicing the syscall prototype.
* @param creds [IN] credentials to use.  See definition in fsal_types.h
*/

int server_syscall(int syscall_number, void *syscall_args, struct creds 
*creds);

This syscall would have its own matching vector of the kernels calls it does.  
Maybe this is a bit in the syscall vector.  Point being not all calls would be 
supported, only a small set.

The syscall args would be packaged and managed like ioctl does it now.  This 
is an extra dereference in the syscall processing to validate the struct and 
copy the args in/out.  The same applies to creds only instead of applying them 
to the specific syscall's stack frame, they would go into the "effective" 
uid/gid for the thread.

We save the back and forth across the syscall barrier with slightly more 
overhead per affected call which is less than the multiple roundtrips for 
setfsuid/gid.  As a priv'd syscall, it becomes outside the set of "posix" 
compliance so we can also bypass things like posix lock behavior.  It is also 
expandable without breaking the bank on syscalls or moving ABIs.

Further rationale for this is that the *at calls and handle calls do have more 
general use and therefore fit in the set of general syscalls.  This is an 
enabler for servers that can take over in user space tasks that once were 
mandated into the kernel because of these user masquerading issues.

Last point, No, I haven't researched what the Samba team has lobbied for but I 
suspect that if they are asking for variant syscalls like the *at case, this 
has lower impact.

Jim
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

--- End Message ---

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux