[linux-lvm] High level architecture (Long)

"Stephen Perkins" <perkins@netmass.com> · Tue, 25 Feb 2003 13:06:27 -0600

Hi all,

This needs to be cross-posted between several mailing lists.... But alas
there seems to be no way to coordinate that.  

Companies that have huge pockets and huge needs, for example Visa, have
exceedingly strong disaster recovery and failover operation in place and
working. Their operations can quickly fail over to multiple
geographically separated locations with small impact on their service.
The precise details of how it is done are beyond me. However, we should
be able to draw a complete picture of what _can_ be done with Linux
using all the pieces that _are_ available (and at least show what is
lacking).

I am hoping to start a discussion on the architecture of an all Linux
distributed, replicated, highly available, virtualized platform.  It
seems that many of the pieces are in place (muliple pieces in many
instances).  However, I'm looking to spark discussion on the best
architecture of such a beast.

Specifically, what I want is to get as close to possible at providing:

1) A highly available, peer replicated (i.e. active-active),
geographically separated (n locations where n=2 and n>2), virtualized,
and Backed-up SAN solution.  That's a tall order.  Companies such as
FalconStor provide some of this, but at an exceptionally high price.

2) SCSI over IP or something similar is desirable so that application
servers can use the exported virtualized storage (either locally on the
same LAN or remotely over a WAN).  Security here is a big ?.

3) A highly available virtualized application server pool that supports
automatic failover to a remote location.  A combination of Clusters,
LVS, user-mode-linux and replicated SANs can provide a potential
solution.

4) An eye towards to the cost of servers, infrastructure, colocation,
and bandwidth.  I.e. few servers at expensive locations and more servers
at less expensive and less well equipped places.  User-mode linux for
server consolidation through virtualization, etc.

5) An eye towards managmenet of the solution?  Are all these pieces
manageable without a large staff?  Are there recommended commercial
Linux solutions for management?

-----  AS AN END RESULT ---------
I would like, as an end result, to publish some type of HOW-TO on
combining all the version of sofware and hardware that are available on
1 specific architecture that will provide for this.  Since we plan on
deploying at 2 and possibly 3 separate sites, the work should easily
scale back to 1 local site (although the reverse is obviously not true).
---------------------------------

This list would be the place to ask regarding #1 and possibly #2.  Here
is what I have envisioned. But I'm not sure it is the best way to handle
things:

1) At this time, I am using RedHat 8.0 and kernel 2.4.19 (full version
... Not RedHat version).

2) I opted to use LVM 1.0.6 versus LVM2 since I'm looking at a
production environment.

3) I have 2 identical Compaq TaskSmart n2400 boxes.  These boxes have
hardware RAID-5 and currently support internal disks that are configured
as a single RAID-5 volume.  During the RH install, I partitioned the
single RAID-5 volume to leave a large chunk available as a LVM
partition.  I created a single VG on this, added the large partition as
the PVs, and can now create LVs as needed.  This seems to work fine.
Under ideal conditions, this provides a scalable and virtualized storage
pool.

4) On each system, I created identical logical volumes.  I installed
DRBD version 0.6.1 on top of LVM to export a logical volume as a network
block device.  I was able to do a network mirror between the two LVs  to
provide for a replicated logical volume.  

5) I have not addressed the security of the replicated information.
(Opted to use DRBD versus NBD or ENBD).  Just recently heard about
HyperSCSI but don't know much about it. There seem to be other packages
as well.    It is blasphemy to mention EVMS on this list?

I never completed work on fail-over scripts to automatically bring up
and export the replicated information.  Part of this is due to my
questions on the best way to "export" the data (see below).

Here are the problems:

1) Each Tasksmart is a SPOF.  It would be better to have some type of
SAN solution where I can plug in multiple LVM manager machines that are
clustered into a 2-node active-active or active-passive mechanism.
These nodes could then "export" the logical volumes to the people who
need them. I have installed heartbeat on a couple of machines and have
it working, so a small 2 node cluster is possible here.  However, then
we have the issue of how to "access" the remote storage (either fibre
channel or dual access scsi).  What are the other alternatives?  ISCSI
initiators that sit between a Cisco storage router?  How would one
virtualize the storage pool if ISCSI were involved?

2) What is the best way to "export" the replication information?  I'm
guessing it is best to do this at the block level (although I believe
products such as double-take/rsync do it at the file system level).  Is
DRBD the best solution for wide area replication?   

3) What is the best way to "export" the blocks or file system for
servers that need to access the logical volume?  NFS at the file systme
level?  iSCSI (if I can figure out how to make the logical volume a
target) at the block level?  I envision that the application servers
will actually be user-mode-linux servers that are consolidated on an
active-active Linux cluster pair.  

4) BTW, I envision that each application server will have its own
logical volume so that we can minimize the potential of concurrent
access to a logical volume.  If the app servers are clustered, then
fencing the resources is a cluster problem and not a SAN problem. What
are the concurrent access problems for access and replication.

3) What is the best way to secure the information?  What about
encryption of the replication tunnel?  Hardware VPN or a software
solution?  How important and stable etc would an encrypted file system
be?

4) What is the best way to back up the SAN solution?  We have qualstar
tape libraries available.... Commercial bakup apps?  Amanda?  Done at
the file system level or block level?  I probably need both the security
of replication (for direct failover) and tape backup (for generational
capability).

Here are the questions:

1) Is the architecture already done somewhere else (complete with fixed
versions of OS and apps that are "known to work")?  I don't want to
re-invent the wheel.

2) What would be the best way to coordinate people's feedback between
multiple mailing lists (LVM, LVS, iSCSI, etc)?

3) What would be the best way to show the architecture and apply peoples
comments and feedback?

With so many Linux projects working, it is very difficult to get a grasp
on all the pieces that would be needed to provide what I want.  However,
it does appear that all or most of the pieces are available.  I have
looked at combining:

A) Heartbeat + Keepalived + LVS  to provide a highly available LVS
director.  This small cluster can be put into a very expensive and very
highly available colocation with lots of redundant bandwidth.  It
"exports" the highly available IP addresses that are used to provide
services.  If this cluster, or the network to which it is attached goes
down, all services go down (there are no automatic solutions to solve
this type of problem with any immediacy that I can find... Things like
RR DNS etc... All suffer problems).  Through LVS TUN mode, LVS
real-servers can exist at geographically remote sites that are not as
cost-prohibitive.

B) Heartbeat to provide a high-availabilty Linux cluster that then runs
mulitple instances of "User mode linux".  This provides highly available
virtualized servers that can act as the "LVS Real-servers"

C) LVM, DRBD to provide replicated virtualized storage.  However,  I
have not figured out the best way to export this storage to the
user-mode-linux virtual servers listed in step B.

I do not have ISCSI (seem to be missing target support), secure
replication, Good clustering failover of user-mode-linux virtual servers
to geographically separate locations (in case of network failure),
backup, scalable storage short of physically adding internal or direct
attached disks to the Compaq Tasksmart machines. 

Thoughts to the rather long posting?

TIA,

- Steve

Attachment:
smime.p7s

Description: application/pkcs7-signature