Re: NFSv4 high availability setups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 17 Apr 2012 16:34:48 +0200
Lukas Hejtmanek <xhejtman@xxxxxxxxxxx> wrote:

> Hi,
> 
> On Tue, Apr 10, 2012 at 09:13:21AM -0400, Jeff Layton wrote:
> > Nope. It'll all work just great...until it doesn't. I don't have any
> > specific failure scenarios, but most of the problems will be issues
> > with state recovery when a server node is restarted.
> > 
> > That may manifest in different ways -- problems reclaiming locks for
> > instance, or even silent data corruption depending on the application.
> 
> would it work if I relax active-active scenario to just active-passive in the
> following way:
> 
> Server A actively exports  /export/A
> Server B actively exports  /export/B
> 
> Server B is passive backup for Server A
> Server A is passive backup for Server B
> 
> would it work to migrate the failed Server B to Server A so that Server A will
> server both /export/A and /export/B?
> 
> There will be a problem with v4recovery dir. Would it be possible just to
> merge v4recovery from Server B to Server A (nfs export would be stopped while
> merging v4recovery).
> 
> It seems that cp -r B/v4recovery/* A/v4recovery/ would do all the things. Am
> I right?
> 
> Do I need to copy recovery state if I delay migration of the failed Server B to
> Server A for 91 secs? I.e., longer than lease expiry time.. Or do I still need
> a record for the client in v4recovery dir in such a case?
> 

That'll still be dangerous. Suppose (for instance) that a client1 lost
communication with server B for a period of time and then it expired
the lease and handed out a lock to client2 that it was holding
previously. client2 modifies the file and drops the lock. At the same
time, client1 has uninterrupted communication with serverA, and holds
state on it.

Eventually, you fail over server B and merge the directories. client1
attempts to renew its lease, but gets back an error and starts
reclaiming things. Now, server B would have denied reclaim of that lock
-- its lease had expired, but in this case it's allowed because you
merged the directory and it client1 held state on serverA. client1
reclaims the lock and thinks that it's held the lock the entire time --
data corruption and other hilarity ensues...

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux