Re: Storage Cluster Newbie Questions - any help with answers greatly appreciated!

"Michael @ Professional Edge LLC" <m3@xxxxxxxxxxxxxxxxxxxxxxx> · Mon, 15 Mar 2010 15:04:57 -0700

Ok... I don't feel quite as silly... as I had roasted one of my test 
machines (unable to mount /) - I had to do a full rebuild from scratch.

Well... It seems I had 2 of 3.

For anyone who ever needs to disable their Qlogic fiber card from 
auto-starting in the future - do all of the following 3 changes.

1. Modify /etc/modprobe.conf - and comment out the line "#alias 
scsi_hostadapter2 qla2xxx"
2. Rebuild the init - I use this - "mkinitrd -f /boot/initrd-$(uname 
-r).img $(uname -r)"
3. Add "blacklist qla2xxx" - to /etc/modprobe.d/blacklist

-Michael

Michael @ Professional Edge LLC wrote, On 3/15/2010 11:05 AM:
DOH... ok I feel stupid... :-)

Well 'blacklist scsi_transport_fc' didn't work... but - 'blacklist 
qla2xxx' works fine.

Amazing how all the docs I found said to use - remove qla2xxx modprobe 
-r - kind of stuff... and all that I really had to do was add the 
silly thing to the blacklist... :-)

OK... at least I can proceed with my testing again.  Thank You!

-Michael

Kaloyan Kovachev wrote, On 3/15/2010 1:41 AM:
Hello,

On Sun, 14 Mar 2010 22:28:20 -0700, Michael @ Professional Edge LLC 
wrote
Kaloyan,

I agree - disabling the qla2xxx driver (Qlogic HBA) from starting at
boot would be the simple method of handling the issue.  Then I just put
all the commands to load the driver, multipath, mdadm, etc... inside
cluster scripts.

Amusingly it seems I am missing something very basic - as I can't seem
to figure out how to not load the qla2xxx driver.

Do you happen to know the syntax to make the qla2xxx driver not load at
boot automatically?

I've been messing with /etc/modprobe.conf - and mkinird - but no
combination has resulted in the - qla2xxx being properly disabled 
during
boot - I did accomplish making one of my nodes unable to mount it's 
root
partition - but I don't consider that success. :-)

Is 'blackist scsi_transport_fc' not enough? What other modules are 
loaded? If
you blacklist the one, that most others depend on they should not load.

As for your 2nd idea; I have seen folks doing something similar in that
mode; when the disks are local to the node.  But in my case - all nodes
- can already see all LUNs - so I dont really have any need to do an
iSCSI export - appreciate the thought though.

The idea was actually not to export them, but to run mdamd 
simultaneously on
both nodes. But the problem is when just one of the nodes looses its 
link, to
just one of the arrays

-Michael

Kaloyan Kovachev wrote, On 3/4/2010 10:28 AM:
On Thu, 04 Mar 2010 09:26:35 -0800, Michael @ Professional Edge LLC 
wrote

Hello Kaloyan,

Thank you for the thoughts.

You are correct when I said - "Active / Passive" - I simply meant 
that I
had no need for "Active / Active" - and floating IP on the NFS share
would be exactly what I had in mind.

The software raid - of any type, raid1,5,6 etc... is the issue.  From
what I have read - mdadm - is not cluster aware... and... since all
disks are seen by all RHEL nodes. - As Leo mentioned; some method to
disable the kernel from finding detecting and attempting to 
assemble all
the available software raids - is a major problem.  This is why I was
asking if perhaps - CLVM w/mirroring would be a better method.  
Although
since it was just introduced in RHEL 5.3 - I am a bit leery.

I am not common with FC, so maybe completely wrong here, but if you 
do not
start multipath and load your HBA drivers on boot, how the FC disks 
based
software raid will start at all?

even if started you may still issue 'mdadm --stop /dev/mdX' in S00 as
suggested from Leo and assemble it again as a cluster service later

Sorry for being confusing - yes - the linux machines will have a
completely different filesystem share; than the windows machines.  My
original thought was I would do "node#1 primary nfs share (floating
ip#1) to linux machines w/node#2 backup" - and then "node#2 
primary nfs
or samba share (floating ip#2) to windows machines w/node#1 backup".

Any more thoughts you have would be appreciated... as my original 
plan
with MDADM w/HA-LVM - so far doesn't seem very possible.

Then there are two services each with its own raid array and ip, 
but basically
the same

another idea ... not using it in production, but i had good results 
(testing)
with (small) software raid5 array from 3 nodes ... Local device on 
each node
exported via iSCSI and software RAID5 over the imported ones which 
is then
used from LVM. Weird, but worked and the only problem was that on 
every reboot
of any node the raid is rebuilt, which i won't happen in your case 
as you will
see all the disks in sync (after the initial sync done on only one 
of them)
... you may give it a try

-Michael

Kaloyan Kovachev wrote, On 3/4/2010 8:52 AM:

Hi,

On Wed, 03 Mar 2010 11:16:07 -0800, Michael @ Professional Edge 
LLC wrote

Hail Linux Cluster gurus,

I have researched myself into a corner and am looking for 
advice.  I've
never been a "clustered storage guy", so I apologize for the 
potentially
naive set of questions.  ( I am savvy on most other aspects of 
networks,
hardware, OS's etc... but not storage systems).

I've been handed ( 2 ) x86-64 boxes w/2 local disks each; and ( 2 )
FC-AL disk shelves w/14 disks each; and told to make a mini 
NAS/SAN (NFS
required, GFS optional).  If I can get this working reliably 
then there
appear to be about another ( 10 ) FC-AL shelves and a couple of 
Fiber
Switches laying around that will be handed to me.

NFS filesystems will be mounted by several (less than 6) linux 
machines,
and a few (less than 4) windows machines [[ microsoft nfs client 
]] -
all more or less doing web server type activities (so lots of 
reads from
a shared filesystem - log files not on NFS so no issue with high IO
writes).  I'm locked into NFS v3 for various reasons.  
Optionally the
linux machines can be clustered and GFS'd instead - but I would 
still
need to come up with a solution for the windows machines - so a NAS
solution is still required even if I do GFS to the linux boxes.

Active / Passive on the NFS is fine.

Why not start NFS/Samba on both machines with only the IP 
floating between
them then?

* Each of the ( 2 ) x86-64 machines have a Qlogic dual HBA 1 fiber
direct connected to each shelf  (no fiber switches yet - but 
will have
them later if I can make this all work); I've loaded RHEL 5.4 
x86-64.

* Each of the ( 2 ) RHEL 5.4 boxes - used the 2 local disks 
w/onboard
fake raid1 = /dev/sda - basic install so /boot and LVM for the 
rest -
nothing special here (didn't do mdadm basically for simplicity 
of /dev/sda)

* Each of the ( 2 ) RHEL 5.4 boxes can see all the disks on both 
shelves
- and since I don't have Fiber Switches yet - at the moment 
there is
only 1 path to each disk; however as I assume I will figure out 
a method
to make this work - I have enabled multipath - and therefore I have
consistent names to 28 disks.

Here's my dilemma.  How do I best add Redundancy to the Disks, 
removing
as many single points of failure, and preserving as much 
diskspace as
possible?

My initial thought was - to take "shelf1:disk1 and shelf2:disk1" 
and put
them into a software raid1 - mdadm; then put the resulting 
/dev/md0 into
a LVM.  When I need more diskspace, I just then create 
"shelf1:disk2 and
shelf2:disk2" as another software raid1 then just add the new 
"/dev/md1"
into the LVM and expand the FS. This handles a couple things in 
my mind:

1. Each shelf is really a FC-AL so it's possible that a single disk
going nuts could flood the FC-AL and all the disks in that shelf 
go poof
until the controller can figure itself out and/or the bad disk 
is removed.

2. Efficient I am retaining 50% storage capacity after 
redundancy - if I
can do the "shelf1:disk1 + shelf2:disk2" mirrors; plus all 
bandwidth
used is spread across the 2 HBA fibers and nothing goes over the 
TCP
network.  Conversely DRBD doesn't excite me much - as I then 
have to do
both raid in the shelf (probably still with MDADM) and then I 
add TCP
(ethernet) based RAID1 between the nodes - and when all is said 
and done
- I only the have 25% of storage capacity still available after 
redundancy.

3. I easy to add more diskspace - as each new mirror (software 
raid1)
can just be added to an existing LVM.

You may create RAID1 (between the two shelfs) over RAID6 (on the 
disks from
the same shelf), so you will loose only 2 more disks per shelf or 
about 40%
storage space left, but more stable and faster. Or several RAID6 
arrays with
2+2 disks from each shelf - again 50% storage space, but better 
performance
with the same chance for data loss like with several RAID1 ... 
the resulting
mdX you may add to LVM and use the logical volumes

       From what I can find messing with Luci (Conga) though... 
is - I don't
see any resource scripts listed for - "mdadm" (on RHEL 5.4) - so 
would
my idea even work  (I have found some posts asking for a mdadm 
resource
script but I've seen no response)?  I also see with RHEL 5.3 LVM 
has
mirrors that can be clustered now - is this the right answer?  
I've done
a ton of reading but everything I've dug up so far; assumes that 
the
fiber devices are being presented by a SAN that is doing the 
redundancy
before the RHEL box sees the disk... or... there are a ton of 
examples
of where fiber is not in the picture and there are a bunch of 
locally
attached hosts presenting storage onto the TCP (ethernet) - but 
I've not
found nearly anything on my situation...

So... here I am... :-)  I really just have 2 nodes - who can 
both see -
a bunch of disks (JBOD) and I want to present them to multiple 
hosts via
NFS (required) or GFS (to linux boxes only).

if the Windows and Linux data are different volumes it is better to
leave the
GFS partition(s) available only via iSCSI to the linux nodes
participating in
the cluster and not to mount it/them locally for the NFS/Samba 
shares,
but if
the data should be the same you may go even Active/Active with 
GFS over
iSCSI
[over CLVM and/or] [over DRBD] over RAID and use NFS/Samba over 
GFS as a
service in the cluster. It all depends on how the data will be 
used from the
storage

All ideas - are greatly appreciated!

-Michael

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster