Re: The TCP and UDP checksum algorithm may soon need updating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 10 Jun 2020, Warren Kumari wrote:
Having read the papers that Craig referenced, that's my interpretation.

One of them is about a big physics application which sends multiple
terabytes of data over the net using what looks like a version of
FTP that transfers several files at once.  They send the data as a lot
of of 4 gig files. When they started verifying file checksums, they
found about 20% of the received files were corrrupted in transit.

I'm assuming you are talking about "Cross-Geography Scientific Data
Transferring Trends and Behavior", which contains (Section 4.1
Checksum, encryption, and reliability, p.12):

No, it's "Transferring a Petabyte in a Day".

https://www.researchgate.net/publication/325405478_Transferring_a_Petabyte_in_a_Day

"As mentioned, we split each 1.2 TiB snapshot into 256 files of approximately equal size. We determined that transferring 64 or 128 files concurrently, with a total of 128 or 256 TCP streams, yielded the maximum transfer rate. We achieved an average disk-to-disk transfer rate of 92.4 Gb/s (or 1 PiB in 24 hours and 3 minutes): 99.8% of our goal of 1 PiB in 24 hours, when the end-to-end verification of data integrity in Globus is disabled. In contrast, when the end-to-end verification of data integrity in Globus is enabled, we achieved an average transfer rate of only 72 Gb/s (or 1 PiB in 30 hours and 52 minutes).

The Globus approach to checksum verification is motivated by the observations that the 16-bit TCP checksum is inadequate for detecting data corruption during communication [16, 17] and that corruption can occur during file system operations [18]. Globus pipelines the transfer and checksum computation; that is, the checksum computation of the ith file happens in parallel with the transfer of the (i + 1)th file. Data are read twice at the source storage system (once for transfer and once for checksum) and written once (for transfer) and read once (for checksum) at the destination storage system. Therefore, in order to achieve the desired rate of 93 Gb/s for checksum-enabled transfers, in the absence of checksum failures, 186 Gb/s of read bandwidth from the source storage system and 93 Gb/s write bandwidth and 93 Gb/s of read bandwidth concurrently from the destination storage system are required. If checksum verification failures occur (i.e., one or more files are corrupted during the transfer), even more storage I/O bandwidth, CPU resources, and network bandwidth are required in order to achieve the desired rate."

Globus is a file transfer service from U of Chicago

https://www.globus.org/data-transfer

Regards,
John Levine, johnl@xxxxxxxxx, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux