On 09/11/2011 19:09, Magnus Näslund wrote:
On 11/09/2011 06:51 PM, Gordan Bobic wrote:
My main concern with such data volumes would be the error rates of
modern disks. If your FS doesn't have automatic checking and block level
checksums, you will suffer data corruption, silent or otherwise. Quality
of modern disks is pretty appaling these days. One of my experiences is
here:
http://www.altechnative.net/?p=120
but it is by no means the only one.
Interesting read, and I agree that raid data corruption and hard disk
untrustworthiness issues being a huge problem. To combat this we're
thinking of using a crude health checking utility that would use
checksum files, on top of whatever we end up using (glusterfs or
otherwise). These scripts would be specific to our application, and file
based.
In glusterfs I believe that it would be possible to do the checksum
checking locally on the nodes, since the underlying filesystem is
accessible?
In some cases. If you are using striping, then that gets potentially
tricky. If you are using straight mirroring, then yes, you could easily
just check the underlying files. However, that would mean manually
correcting things. ZFS will check this for you every time a file is
accessed or scrubbed and auto-correct the corrupted blocks, so no
corrupted data ever gets served.
Currently the only FS that meets all of my reliability criteria is ZFS
(and the linux port works quite well now), and it has saved me from data
corruption, silent and otherwise, a number of times by now, in cases
where normal RAID wouldn't have helped.
We're using OpenSolaris+ZFS today in production, if glusterfs works well
on OpenSolaris that might very well be what we end up with.
I have no idea whether glfs works well on Solaris. It's worth trying,
but given how much effort Emanuel has put into porting it to (Net)BSD,
it may well not "just work", but perhaps one of the developers will be
able to clarify whether glfs is tested/supported/expected to work on
Solaris.
We're a linux-shop, but we settled for OpenSolaris on ZFS alone.
Are you running glusterfs on Solaris or/and Linux in production?
On Linux.
You may be interested to have a look here:
http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_thread/thread/4d88218d6c8f67f0/78a7b633dd66157a?hl=en&lnk=gst&q=glusterfs#78a7b633dd66157a
Anyway, to summarize:
1) With large volumes of data, you need something other than the disk's
sector checksums to keep your data correct, i.e. a checksum checking FS.
If you don't, expect to see silent data corruption sooner or later.
The silent corruption case can be mitigated an application specific way
for us, as described above. Having that automatically using ZFS is
definately interesting in several ways. Does glusterfs have (or plan to
have) a scrubbing-like functionality that checks the data?
I'd be interested to hear the developers' thoughts on this, but I think
this would be extremely expensive. Doing "ls -laR" to check/auto-heal
all files is already very time consuming on large file systems.
Calculating md5s of all files as well would be orders of magnitude more
expensive.
Gordan