Re: parallel file create rates (+high latency)

Patrick Goetz <pgoetz@xxxxxxxxxxxxxxx> · Tue, 25 Jan 2022 16:11:18 -0600

On 1/25/22 15:59, Bruce Fields wrote:
On Tue, Jan 25, 2022 at 03:50:05PM -0600, Patrick Goetz wrote:
On 1/25/22 09:30, Chuck Lever III wrote:
On Jan 25, 2022, at 8:59 AM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
On Tue, Jan 25, 2022 at 12:52:46PM +0000, Daire Byrne wrote:
Yea, it does seem like the server is the ultimate arbitrar and the
fact that multiple clients can achieve much higher rates of
parallelism does suggest that the VFS locking per client is somewhat
redundant and limiting (in this super niche case).

It doesn't seem *so* weird to have a server with fast storage a long
round-trip time away, in which case the client-side operation could take
several orders of magnitude longer than the server.

Though even if the client locking wasn't a factor, you might still have
to do some work to take advantage of that.  (E.g. if your workload is
just a single "untar"--it still waits for one create before doing the
next one).

Note that this is also an issue for data center area filesystems, where
back-end replication of metadata updates makes creates and deletes as
slow as if they were being done on storage hundreds of miles away.

The solution of choice appears to be to replace tar/rsync and such
tools with versions that are smarter about parallelizing file creation
and deletion.

Are these tools available to mere mortals?  If so, what are they
called.  This is a problem I'm currently dealing with; trying to
back up hundreds of terabytes of image data.

How many files, though?

IDK, 4000 images per collection, with hundreds of collections on disk? 
Say at least 500,000 files?  Maybe a million? With most files about 1GB 
in size.  I was trying to just rsync it all from the data server to a 
ZFS-based backup server in our data center, but the backup started 
failing constantly because the filesystem would change after rsync had 
already constructed an index. Even after an initial copy, a backup like 
that runs for over a week.  The strategy I'm about to try and implement 
is to NFS mount the data server's data partition to the backup server 
and then have a script walk through the directory hierarchy, rsyncing 
collections one at a time.  ZFS send/receive would probably be better, 
but the data server isn't configured with ZFS.

Writes of file data *should* be limited mainly just be your network and
disk bandwidth.

Creation of files is limited by network and disk latency, is more
complicated, and is where multiple processes are more likely to help.

--b.