Re: gnfs split brain when 1 server in 3x1 down (high load) - help request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 29/03/20 9:40 am, Erik Jacobson wrote:
Hello all,

I am getting split-brain errors in the gnfs nfs.log when 1 gluster
server is down in a 3-brick/3-node gluster volume. It only happens under
intense load.

In the lab, I have a test case that can repeat the problem on a single
subvolume cluster.

  If all leaders are up, we see no errors.


Here are example nfs.log errors:


[2020-03-29 03:42:52.295532] E [MSGID: 108008] [afr-read-txn.c:312:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing ACCESS on gfid 8eed77d3-b4fa-4beb-a0e7-e46c2b71ffe1: split-brain observed. [Input/output error]

Since you say that the errors go away when all 3 bricks (which I guess is what you refer to as 'leaders') of the replica are up, it could be possible that the brick you brought down had the only good copy. In such cases, even though you have the other 2 bricks of the replica up, they both are bad copies waiting to be healed and hence all operations on those files will fail with EIO. Since you say this occurs under high load only. I suspect this is the case since heal hasn't had the time to catch up with the nodes going up and down.

If you see the split-brain errors despite all 3 replica bricks being online and the gnfs server being able to connect to all of them, then it could be a genuine split-brain problem. But I don't think that is the case here.

Regards,
Ravi

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux