Re: directory delegations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 4, 2019 at 11:37 AM bfields@xxxxxxxxxxxx
<bfields@xxxxxxxxxxxx> wrote:
>
> On Thu, Apr 04, 2019 at 11:09:47AM -0400, Jeff Layton wrote:
> > On Wed, Apr 3, 2019 at 9:06 PM bfields@xxxxxxxxxxxx <bfields@xxxxxxxxxxxx> wrote:
> > The serialized create with something like an untar is a
> > performance-killer though.
>
> Yes.  And Trond's proposal only allows hiding the server-to-disk round
> trip time, not the client-to-server round trip time.  On the other hand,
> it seems a lot easier than write delegations.
>
> > FWIW, I'm working on something similar right now for Ceph. If a ceph
> > client has adequate caps [1] for a directory and the dentry inode,
> > then we should (in principle) be able to buffer up directory morphing
> > operations and flush them out to the server asynchronously.
> >
> > I'm starting with unlink (mostly because it's simpler), and am mainly
> > just returning early when we do have the right caps -- after issuing
> > the call but before the reply comes in. We should be able to do the
> > same for link, rename and create too. Create will require the Ceph MDS
> > to delegate out a range of inode numbers (and that bit hasn't been
> > implemented yet).
>
> Is there some reason it's impossible for the client to return from
> create before it has an inode number?
>

Not necessarily, but you can't handle a stat() at that point until the
create returns. Also for cephfs, we can't issue data writes to the
OSDs until we know the inode number (the underlying objects are named
with the format "inode_number.chunk_index"). Cephfs works a little
like pNFS, in that we do reads and writes directly to/from the OSDs,
but the data is placed algorithmically so we know what the layout will
be if we know the inode number.

> > My thinking with all of this is that the buffering of directory
> > morphing operations is not as helpful as something like a pagecache
> > write is, as we aren't that interested in merging operations that
> > change the same dentry. However, being able to do them asynchronously
> > should work really well. That should allow us to better parallellize
> > create/link/unlink/rename on different dentries even when they are
> > issued serially by a single task.
> >
> > RFC5661 doesn't currently provide for writeable directory delegations,
> > AFAICT, but they could eventually be implemented in a similar way.
>
> People also worried about delegating create in the face of differing
> rules about case insensitivity and about which characters are legal in
> filenames.  But I really think there should be some way to manage that.
>

Oh, good god. I hadn't even considered that.

I tend to think at that point, we could just return EINVAL on a
subsequent fsync of the dir or something, and let the program sort out
what went wrong.
-- 
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux