Re: file-copy corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Terry,

I think you mentioned you had RHEL 4, well I recommend that your rpm -e rsync the version that comes with RHEL4, its very old, you can download the FC5 src.rpm and rebuild it on RHEL4, it fixes a few bugs and offers a few new features - that I certainly needed when copy 10TB data.

Also, in relation to (Justin's) gigabit speed, well there are a number of factors which impact on the speed you see on the wire, these are mostly using rsync over ssh - which is very slow because both ends most do compression/decompression, but the biggest fact is disk speed/buffering/caching, you'll notice that you get bursts at the start of a copy then a big drop, this because your local disk cache was empty and as soon as it fills up the you are basically transferring at sub-disk speeds and the key to good speed is to keep the pipe full at all times.

I also increased the read-a-head size using "blockdev --setra 16384 /dev/sdX"

Using 9K mtu may help if you have good Cisco switches, but that's another story.

For best performance setup an rsync "server" and disable compression, you should also be aware that if you have trillions of files and billions of directories, this can really hog the server which may pi$$ off your users especially when you do your final rsync migration run.

Hope some of this is useful.

Albert.


Justin Piszcz wrote:
I've found NFS access in Linux on gigabit slow.

Under gigabit:
With NFS, 10-30MB/s varying rates using NFS3 and different R/W sizes.
With FTP, consistent 40-60MB/s.

Back in the day when I used 100mbps, NFS was great, it hit 11-12MB/s and stayed there, sustained. With gigabit, this is no longer the case, at least for me, I see bursts of 50MB/s and then 0, then two seconds later, 50MB/s, overall, it is MUCH slower than FTP (or even probably rsync/ssh).

Justin.


On Thu, 29 Jun 2006, T. Horsnell wrote:

Hi Terry,

I tend to disagree with the other who have replied so far, I've found
NFS to be 100% reliable for many years, with large clusters of clients
using many flavors of Unix, Whenever things have failed I've always
being able to find the root cause.

I'd suggest that you look are your messages file for indications of the
problem. Also one tool you can .use is nfsstat (man nfsstat) it should
indicate NFS related bad calls.

On any recent Linux, it would be very rare for there to be "no
indication", so your log files are your friend.

If you really cannot find any message or indication, it stands to reason that the files in question may have been open/updated by another user or
process during the gzip process, is that possible ?

I would agree that Rsync is a good choice for this task (you could run
"rsync --dry-run --stats" to show any differences) that exist.

OK, I'm convinced about using rsync to build the filesystem copy.
gnu-tar over NFS seems to take about 50% longer.
However, as far as I can tell, rsync uses the file-modification time
to determine whether a source and destination file are possibly different
(or the file length if --size-only is selected), and only if these
indicate that the files may be different does it start to look at the
differences. Yes/no?


Cheers,
Terry



Albert.

T. Horsnell wrote:
I'm in the process of moving stuff from our Alpha fileserver
onto A linux replacement. I've been using gnu-tar to copy filesystems
from the Alpha to to the Linux NFS-exported disks over a 1Gbit LAN,
followed by diff -r to check that they have copied correctly (I wish
diff had an option to not follow symlinks..). I've so far transferred
about 3 TiB of data (spread over several weeks) and am concerned
that during this process, 3 files were mis-copied without any
apparent hardware-errors being flagged. There was nothing unusual
about these files, and re-copying them (with cp) fixed the problem.

Are occasional undetected errors like this to be expected?
I thought there were sufficient stages of checksumming/parity
(both boxes have ECC memory) etc to render the probability
of this to be vanishingly small.

On all 3 files, multiple retries of the diff still resulted
in a compare error, which was then fixed by a re-copy. This
suggests that the problem occurs during the 'gtar' phase, rather
than the 'diff -r' phase.

Does anyone know of a network-exercise utility I can use
to check the LAN component of the data-path?

Cheers,
Terry.



--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list


--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list



--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora Magazine]     [Fedora News]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [SSH]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux