On Mon, Jan 24, 2022 at 08:10:07PM +0000, Daire Byrne wrote: > On Mon, 24 Jan 2022 at 19:38, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > > > On Sun, Jan 23, 2022 at 11:53:08PM +0000, Daire Byrne wrote: > > > I've been experimenting a bit more with high latency NFSv4.2 (200ms). > > > I've noticed a difference between the file creation rates when you > > > have parallel processes running against a single client mount creating > > > files in multiple directories compared to in one shared directory. > > > > The Linux VFS requires an exclusive lock on the directory while you're > > creating a file. > > Right. So when I mounted the same server/dir multiple times using > namespaces, all I was really doing was making the VFS *think* I wanted > locks on different directories even though the remote server directory > was actually the same? In that scenario the client-side locks are probably all different, but they'd all have to wait for the same lock on the server side, yes. > > So, if L is the time in seconds required to create a single file, you're > > never going to be able to create more than 1/L files per second, because > > there's no parallelism. > > And things like directory delegations can't help with this kind of > workload? You can't batch directories locks or file creates I guess. Alas, there are directory delegations specified in RFC 8881, but they are read-only, and nobody's implemented them. Directory write delegations could help a lot, if they existed. > > So, it's not surprising you'd get a higher rate when creating in > > multiple directories. > > > > Also, that lock's taken on both client and server. So it makes sense > > that you might get a little more parallelism from multiple clients. > > > > So the usual advice is just to try to get that latency number as low as > > possible, by using a low-latency network and storage that can commit > > very quickly. (An NFS server isn't permitted to reply to the RPC > > creating the new file until the new file actually hits stable storage.) > > > > Are you really seeing 200ms in production? > > Yea, it's just a (crazy) test for now. This is the latency between two > of our offices. Running batch jobs over this kind of latency with a > NFS re-export server doing all the caching works surprisingly well. > > It's just these file creations that's the deal breaker. A batch job > might create 100,000+ files in a single directory across many clients. > > Maybe many containerised re-export servers in round-robin with a > common cache is the only way to get more directory locks and file > creates in flight at the same time. ssh into the original server and crate the files there? I've got no help, sorry. The client-side locking does seem redundant to some degree, but I don't know what to do about it. --b.