Re: Corrupt inodes on shared disk...

"Stephen Samuel" <darkonc@xxxxxxxxx> · Tue, 3 Apr 2007 12:40:03 -0700

I don't know much about RHCS, but I'm think that this is more likely
to be a Red Hat problem than an ext3 problem..

1) *IF* RHCS properly locks out the 'dead' system, and it doesn't
manage (at some time after the backup system takes over) to write
cashes to the shared drive,

2) and *IF* the failover software isn't too stupid to do things like
run the journal, and otherwise do sane FSCK things before mounting,
then you shouldn't have a problem.

My best guess is that 2) is relatively unlikely which leaves 1) as
probable cause.

If your primary system does *ANY* writes after the failover starts,
then you can probably expect problems like you've seen here. (does
RHCS _physically_ lock out the second system, or is it a software
lockout?)

The other question I have is: why is the system failing over?  Other
than testing, a well built HA system should almost *never* actually
need to fail over. (we're not talking Windows servers here :-} )  HA
should be like insurance ... You pay up front for it and work to make
sure that you never actually have to use what you've paid for.

On 4/3/07, Paul Fitzmaurice <pfitzmaurice@xxxxxxxxxx> wrote:
I am having problems when using a Dell PowerVault MD3000 with multipath from
a Dell PowerEdge 1950.  I have 2 cables connected and mount the partition on
the DAS Array.  I am using RHEL 4.4 with RHCS and a two node cluster.  Only
one node is "Active" at a time, it creates a mount to the partition, and if
there is an issue RHCS will fence the device and then the other node will
mount the partition.

I have now run into a problem twice where my ext3 (with Journaling) has
corrupt inodes.  This actually has resulted in a filesystem with #xxxxxxxxx
files and directories.

_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users