Re: Starter Cluster / GFS

"Nicolas Ross" <rossnick-lists@xxxxxxxxxxx> · Wed, 10 Nov 2010 11:21:55 -0500

The volume will be composed of 7 1TB disk in raid5, so 6 TB.

Be careful with that arrangement. You are right up against the ragged edge
in terms of data safety.

1TB disks a consumer grade SATA disks with non-recoverable error rates of
about 10^-14. That is one non-recoverable error per 11TB.

Now consider what happens when one of your disks fails. You have to read
6TB to reconstruct the failed disk. With error rate of 1 in 11TB, the
chances of another failure occurring in 6TB of reads is about 53%. So the
chances are that during this operation, you are going to have another
failure, and the chances are that your RAID layer will kick the disk out
as faulty - at which point you will find yourself with 2 failed disks in a
RAID5 array and in need of a day or two of downtime to scrub your data to
a fresh array and hope for the best.

RAID5 is ill suited to arrays over 5TB. Using enterprise grade disks will
gain you an improved error rate (10^-15), which makes it good enough - if
you also have regular backups. But enterprise grade disks are much smaller
and much more expensive.

Not to mention that your performance on small writes (smaller than the
stripe width) will be appalling with RAID5 due to the write-read-write
operation required to construct the parity which will reduce your
effective performance to that of a single disk.

Wow...

The enclosure I will use (and already have) is an activestorage's activeraid
in 16 x 1tb config. (http://www.getactivestorage.com/activeraid.php). The
drives are Hitachi model HDE721010SLA33. From what I could find, error rate
is 1 in 10^15.

We will do have good backups. One of the node will have a local copy of the 
critical data (about 1 tb) on a internally-attached disks. All of the rest 
of the data will be rsync-ed off site to a secondary identical setup.

It will host many, many small files, and some biger files. But the files
that change the most often will mos likely be smaller than the blocsize.

That sounds like a scenario from hell for RAID5 (or RAID6).

What do you suggest to acheive size in the range of 6-7 TB, maybe more ?

The gfs will not be used for io-intensive tasks, that's where the
standalone volumes comes into play. It'll be used to access many files,
often. Specificly, apache will run from it, with document root, session
store, etc on the gfs.

Performance-wise, GFS should should be OK for that if you are running with
noatime and the operations are all reads. If you end up with write
contention without partitioning the access to directory subtrees on a per
server basis, the performance will fall off a cliff pretty quickly.

Can you explain a little bit more ? I'm not sure I fully understand the
partitioning into directories ?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster