Christoph Hellwig wrote:
On Tue, Dec 05, 2006 at 03:44:31PM -0600, Rob Ross wrote:
The openg() really just does the lookup and permission checking). The
openfh() creates the file descriptor and starts that context if the
particular FS tracks that sort of thing.
...
Well you've caught me. I don't want to cache the values, because I
fundamentally believe that sharing state between clients and servers is
braindead (to use Christoph's phrase) in systems of this scale
(thousands to tens of thousands of clients). So I don't want locks, so I
can't keep the cache consistent, ... So someone else will have to run
the tests you propose :)...
Besides the whole ugliness you miss a few points about the fundamental
architecture of the unix filesystem permission model unfortunately.
Say you want to lookup a path /foo/bar/baz, then the access permission
is based on the following things:
- the credentials of the user. let's only take traditional uid/gid
for this example although credentials are much more complex these
days
- the kind of operation you want to perform
- the access permission of the actual object the path points to (inode)
- the lookup permission (x bit) for every object on the way to you object
In your proposal sutoc is a simple conversion operation, that means
openg needs to perfom all these access checks and encodes them in the
fh_t.
This is exactly right and is the intention of the call.
That means an fh_t must fundamentally be an object that is kept
in the kernel aka a capability as defined by Henry Levy. This does imply
you _do_ need to keep state.
The fh_t is indeed a type of capability. fh_t, properly protected, could
be passed into user space and validated by the file system when
presented back to the file system.
There is state here, clearly. I feel ok about that because we allow
servers to forget that they handed out these fh_ts if they feel like it;
there is no guaranteed lifetime in the current proposal. This allows
servers to come and go without needing to persistently store these.
Likewise, clients can forget them with no real penalty.
This approach is ok because of the use case. Because we expect the fh_t
to be used relatively soon after its creation, servers will not need to
hold onto these long before the openfh() is performed and we're back
into a normal "everyone has an valid fd" use case.
> And because it needs kernel support you
fh_t is more or less equivalent to a file descriptor with sutoc equivalent
to a dup variant that really duplicates the backing object instead of just
the userspace index into it.
Well, a FD has some additional state associated with it (position,
etc.), but yes there are definitely similarities to dup().
Note somewhat similar open by filehandle APIs like oben by inode number
as used by lustre or the XFS *_by_handle APIs are privilegued operations
because of exactly this problem.
I'm not sure what a properly protected fh_t couldn't be passed back into
user space and handed around, but I'm not a security expert. What am I
missing?
What according to your mail is the most important bit in this proposal is
that you thing the filehandles should be easily shared with other system
in a cluster. That fact is not mentioned in the actual proposal at all,
and is in fact that hardest part because of inherent statefulness of
the API.
The documentation of the calls is complicated by the way POSIX calls are
described. We need to have a second document describing use cases also
available, so that we can avoid misunderstandings as best we can, get
straight to the real issues. Sorry that document wasn't available.
I think I've addressed the statefulness of the API above?
What's the etiquette on changing subject lines here? It might be useful
to separate the openg() etc. discussion from the readdirplus() etc.
discussion.
Changing subject lines is fine.
Thanks.
Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html