> the speed of data transfer is varying a lot over time (200KB/s > – 120MB/s). [...] The FS in question, has a lot of small files > in it and I suspect this is the cause of the variability – ie, > the transfer of many small files will be more impacted by > greater site-site latency. 200KB/s on small files across sites? That's pretty good. I have seen rates of 3-5KB/s on some Ceph instances for reading local small files, never mind remotely. > If this suspicion is true, what options do I have to improve > the overall throughput? In practice not much. Perhaps switching to all-RAM storage (with battery backup) for OSDs might help :-). In one case by undoing some of the more egregious issues I managed to improve small file transfer rates locally by 10 times, that is to 40-60KB/s. In your case a 10 times, if achievable, improvement might get you transfer rates of 2MB/s. Often the question is not just longer network latency, but whether your underlying storage can sustain the IOPS needed for "scan" type operations at the same time as user workload. Perhaps it would go a lot faster if you just RSYNC, or even just 'tar -f - -c ... | ssh ... tar =f - -x' (or 'rclone' if you don't use CephFS) and it would be worth doing a test of transferring a directory (or bucket if you don't use CephFS) with small files by RSYNC and/or 'tar' to a non-Ceph remote target and a Ceph remote target to see what you could achieve. No network/sharded filesystem (and very few local ones) handles well small files. In some cases I have seen Ceph was used to store a traditional filesystem image of a type more suitable for small files, mounted on a loop device. https://www.sabi.co.uk/blog/anno05-4th.html?051016#051016 https://www.sabi.co.uk/blog/0909Sep.html?090919#090919 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx