On Wed, 10 Jun 2020, Warren Kumari wrote:
Having read the papers that Craig referenced, that's my interpretation.
One of them is about a big physics application which sends multiple
terabytes of data over the net using what looks like a version of
FTP that transfers several files at once. They send the data as a lot
of of 4 gig files. When they started verifying file checksums, they
found about 20% of the received files were corrrupted in transit.
I'm assuming you are talking about "Cross-Geography Scientific Data
Transferring Trends and Behavior", which contains (Section 4.1
Checksum, encryption, and reliability, p.12):
No, it's "Transferring a Petabyte in a Day".
https://www.researchgate.net/publication/325405478_Transferring_a_Petabyte_in_a_Day
"As mentioned, we split each 1.2 TiB snapshot into 256 files of
approximately equal size. We determined that transferring 64 or 128 files
concurrently, with a total of 128 or 256 TCP streams, yielded the maximum
transfer rate. We achieved an average disk-to-disk transfer rate of 92.4
Gb/s (or 1 PiB in 24 hours and 3 minutes): 99.8% of our goal of 1 PiB in
24 hours, when the end-to-end verification of data integrity in Globus is
disabled. In contrast, when the end-to-end verification of data integrity
in Globus is enabled, we achieved an average transfer rate of only 72 Gb/s
(or 1 PiB in 30 hours and 52 minutes).
The Globus approach to checksum verification is motivated by the
observations that the 16-bit TCP checksum is inadequate for detecting data
corruption during communication [16, 17] and that corruption can occur
during file system operations [18]. Globus pipelines the transfer and
checksum computation; that is, the checksum computation of the ith file
happens in parallel with the transfer of the (i + 1)th file. Data are read
twice at the source storage system (once for transfer and once for
checksum) and written once (for transfer) and read once (for checksum) at
the destination storage system. Therefore, in order to achieve the desired
rate of 93 Gb/s for checksum-enabled transfers, in the absence of checksum
failures, 186 Gb/s of read bandwidth from the source storage system and 93
Gb/s write bandwidth and 93 Gb/s of read bandwidth concurrently from the
destination storage system are required. If checksum verification failures
occur (i.e., one or more files are corrupted during the transfer), even
more storage I/O bandwidth, CPU resources, and network bandwidth are
required in order to achieve the desired rate."
Globus is a file transfer service from U of Chicago
https://www.globus.org/data-transfer
Regards,
John Levine, johnl@xxxxxxxxx, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly