Hi list,
our park is going to gain three new boxes, pushing storage size to 70TB.
I think it's time to get rid of nfs /net automounts, and to go for some kind
of a cluster.
long story short:
each typical server has a local storage (1 to 8TB, up to 15 soon), that are
sata discs connected to a 3ware card, using hard raid 10,5,6.
each of these machines is aimed at processing data from a given satellite.
there are also one pgsql server, one apache server, one nis/home (via nfs)
server each with a 3ware and its discs. brw, the nis/nfs server is soon to be
turned into a directory server.
gbps network, non administrable switches. /24 network class. every server run
Centos 4.7 or 5.3 (only one 4.7 remains, to be precise). every box runs
x86_64 software.
now, I'd like to transform that mess into:
1) have one volume for sat1...N data. So that, if needed, you can process
whatever you want from whatever machine.
2) have a failover machine that could automagically take load for pg, apache
and nfs/nis (the soon to be directory server) if the dedicated box fails.
that means an efficient replication so data are identical on original
pg/apache/etc machines and the failover one.
3) have some kind of load balancing on sat1...N, that would put processes on a
box where processed data are local, without having the user to decide where
to launch processes. resulting data from processes would have to be written
on the local storage of the box. So that sat1 data and sat1 processed data
stay on the same physical volume. That way, if a box really badly crashes, we
know which data were lost (we can't afford to backup 70TB).
now, questions (thx for arriving down there:) :
1) what I've read and been told is GFS wouldn't do the trick. Lustre and
hadoop/hdfs could. For now, with what I've read about lustre, it could do the
trick, but found nothing about load balacing.
2) failover should be possible if i understood correctly doc. where i'm a bit
stuck is the replication part. wal shipping should do the trick for pg.
directory server has some kind of failover mechanism afaik. about apache, i'm
a bit in the dark. could someone enlighten me ?
I've been told that drdb could solve the whole replication problem ?
3) is such a thing possible with cluster suite ? at all ? Would there be any
better way to solve problem of boxes configuration so our DC can
continue to grow without becoming a nightmare for me and users ?
4) right now, user homes follow them to whatever box they log on thanks to
nfs. How to make such a thing work with Directory server ? Use another lustre
volume ? what if servers are hidden by a load balancer ?
You'll find attached some kind of ascii art trying to describe what i'd like
to get :) (open it with fixed size font)
Thanks a lot for helping.
Best Regards,
--
Laurent
_____
|S1|----|L |----|U1|--------|
|U | |
|S2|----|S |----|U2|--------|
|T | |
|S3|----|R |----|U3|--------|
|E | |
|S4|----| |----|U4|--------|
|V | |
. |O | . |
. |L | . |
. |U | . |
|Sn|----|M |----|Un|--------| |--home Lustre volume accessible by every box ?
|E | | |
| |---------------|Ds|--|
| | |
| |----|Pg|--|----------|
| | |
| |----|Ap|--|
| | |
| |----|Fo|--|
|___|
|Sx|: boxes with dedicated storage for satellite images processing.
|Ux|: user boxes.
|Ds|: Directory server (serves /home to user machines)
|Pg|: PostgreSQL server
|Ap|: Apache server
|Fo|: Failover server (can take Pg, Ds, Ap load)
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster