> On Apr 4, 2019, at 11:09 AM, Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > On Wed, Apr 3, 2019 at 9:06 PM bfields@xxxxxxxxxxxx > <bfields@xxxxxxxxxxxx> wrote: >> >> On Wed, Apr 03, 2019 at 12:56:24PM -0400, Bradley C. Kuszmaul wrote: >>> This proposal does look like it would be helpful. How does this >>> kind of proposal play out in terms of actually seeing the light of >>> day in deployed systems? >> >> We need some people to commit to implementing it. >> >> We have 2-3 testing events a year, so ideally we'd agree to show up with >> implementations at one of those to test and hash out any issues. >> >> We revise the draft based on any experience or feedback we get. If >> nothing else, it looks like it needs some updates for v4.2. >> >> The on-the-wire protocol change seems small, and my feeling is that if >> there's running code then documenting the protocol and getting it >> through the IETF process shouldn't be a big deal. >> >> --b. >> >>> On 4/2/19 10:07 PM, bfields@xxxxxxxxxxxx wrote: >>>> On Wed, Apr 03, 2019 at 02:02:54AM +0000, Trond Myklebust wrote: >>>>> The create itself needs to be sync, but the attribute delegations mean >>>>> that the client, not the server, is authoritative for the timestamps. >>>>> So the client now owns the atime and mtime, and just sets them as part >>>>> of the (asynchronous) delegreturn some time after you are done writing. >>>>> >>>>> Were you perhaps thinking about this earlier proposal? >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dmyklebust-2Dnfsv4-2Dunstable-2Dfile-2Dcreation-2D01&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=YIKOmJLMLfe5wQR3VJI7jGjCnepZlMwumApzvaKItrY&m=qlAJ6dZPGjbcTzNIpkTyk-RTii6lWw1CLIjF6jp3P2Y&s=aTTFNJlRH-dXrQmE4cSYEUd8Kv3ij5cqTJtvgIixMa8&e= >>>> That's it, thanks! >>>> >>>> Bradley is concerned about performance of something like untar on a >>>> backend filesystem with particularly high-latency metadata operations, >>>> so something like your unstable file createion proposal (or actual write >>>> delegations) seems like it should help. >>>> >>>> --b. > > The serialized create with something like an untar is a > performance-killer though. > > FWIW, I'm working on something similar right now for Ceph. If a ceph > client has adequate caps [1] for a directory and the dentry inode, > then we should (in principle) be able to buffer up directory morphing > operations and flush them out to the server asynchronously. > > I'm starting with unlink (mostly because it's simpler), and am mainly > just returning early when we do have the right caps -- after issuing > the call but before the reply comes in. We should be able to do the > same for link, rename and create too. Create will require the Ceph MDS > to delegate out a range of inode numbers (and that bit hasn't been > implemented yet). > > My thinking with all of this is that the buffering of directory > morphing operations is not as helpful as something like a pagecache > write is, as we aren't that interested in merging operations that > change the same dentry. However, being able to do them asynchronously > should work really well. That should allow us to better parallellize > create/link/unlink/rename on different dentries even when they are > issued serially by a single task. What happens if an asynchronous directory change fails (eg. ENOSPC)? > RFC5661 doesn't currently provide for writeable directory delegations, > AFAICT, but they could eventually be implemented in a similar way. > > [1]: cephfs capabilies (aka caps) are like a delegation for a subset > of inode metadata > -- > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> -- Chuck Lever