> On Jan 25, 2022, at 4:50 PM, Patrick Goetz <pgoetz@xxxxxxxxxxxxxxx> wrote: > > > > On 1/25/22 09:30, Chuck Lever III wrote: >>> On Jan 25, 2022, at 8:59 AM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >>> >>> On Tue, Jan 25, 2022 at 12:52:46PM +0000, Daire Byrne wrote: >>>> Yea, it does seem like the server is the ultimate arbitrar and the >>>> fact that multiple clients can achieve much higher rates of >>>> parallelism does suggest that the VFS locking per client is somewhat >>>> redundant and limiting (in this super niche case). >>> >>> It doesn't seem *so* weird to have a server with fast storage a long >>> round-trip time away, in which case the client-side operation could take >>> several orders of magnitude longer than the server. >>> >>> Though even if the client locking wasn't a factor, you might still have >>> to do some work to take advantage of that. (E.g. if your workload is >>> just a single "untar"--it still waits for one create before doing the >>> next one). >> Note that this is also an issue for data center area filesystems, where >> back-end replication of metadata updates makes creates and deletes as >> slow as if they were being done on storage hundreds of miles away. >> The solution of choice appears to be to replace tar/rsync and such >> tools with versions that are smarter about parallelizing file creation >> and deletion. > > Are these tools available to mere mortals? If so, what are they called. This is a problem I'm currently dealing with; trying to back up hundreds of terabytes of image data. They are available to cloud customers (like Oracle and Amazon) I believe, and possibly for Azure folks too. Try Google, I'm sorry I don't have a link handy. parcp? Something like that. -- Chuck Lever