Unexpected behaviour during replication heal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Original Message -----
> It looks like the disconnection happened in the middle of a write
> transaction (after the lock phase, before the unlock phase). And the

The server was deliberately disconnected after the write had begun, in order to test what would happen in that situation and to document a recovery procedure for it.

> server's detection of client disconnection (via TCP_KEEPALIVE) seems to have
> not happened before the client reconnected.

I've not configured any special keep alive setting for the server or clients - the configuration was an out of the box glusterd.vol file, and a "volume create" sequence with standard params (no special settings or options applied).

The disconnected server was also in that state for approx 10 minutes - not seconds.

I assume the "default" set up is not to hold on to a locked file for over 10 minutes when in a disconnected state?
Surely it shouldn't hold onto a lock *at all* once it's out of the cluster?

> The client, having witnessed the reconnection has assumed the locks have been relinquished by the
> server. The server, however, having noticed the same client reconnection before
> breakage of the original connection has not released the held locks.

But why is the server still holding the locks WAY past the time it should be?
We're not talking seconds here, we're talking minutes of disconnection.

And why, when it is reconnected will it not sync that file back from the other servers that have a full copy of it?

> Tuning the server side tcp keepalive to a smaller value should fix
> this problem. Can you please verify?

Are you talking about the GlusterFS keep alive setting in the vol file, or changing the actual TCP keerpalive settings for the *whole* server?  Changing the server TCP keepalive is not an option, since it has ramifications on other things - and it shouldn't be necessary to solve what is, really, a GlusterFS bug...

Cheers,
Darren.


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux