Re: split-brain errors under heavy load when one brick down

Erik Jacobson <erik.jacobson@xxxxxxx> · Wed, 18 Sep 2019 09:13:39 -0500

Thank you for replying!

> Okay so 0-cm_shared-replicate-1 means these 3 bricks:
> 
> Brick4: 172.23.0.6:/data/brick_cm_shared
> Brick5: 172.23.0.7:/data/brick_cm_shared
> Brick6: 172.23.0.8:/data/brick_cm_shared

The above is correct.

> Were there any pending self-heals for this volume? Is it possible that the
> server (one of Brick 4, 5 or 6 ) that is down had the only good copy and the
> other 2 online bricks had a bad copy (needing heal)? Clients can get EIO in
> that case.

So I did check for heals and saw nothing. The storage at this time was in a
read-only use case. What I mean by that is the NFS clients mount it read only
and there were no write activities going to shared storage anyway at that
time.  So it was not surprising that no heals were listed.

I did inspect both remaining bricks for several of the example problem files
and found them with matching md5sums.

The strange thing, as I mentioned, is it only happened under the job
launch workload. The nfs boot workload, which is also very stressful,
ran clean with one brick down.

> When you say accessing the file from the compute nodes afterwards works
> fine, it is still with that one server (brick) down?

I can no longer check this system personally but as I recall when we
fixed the ethernet problem, all seemed well. I don't have a better
answer for this one than that. I am starting a document of things to try
when we have a large system in the factory to run on. I'll put this in
there.

> 
> There was a case of AFR reporting spurious split-brain errors but that was
> fixed long back (http://review.gluster.org/16362
> ) and seems to be present in glusterf-4.1.6.

So I brought this up. In my case, we know the files on the NFS client
side really were missing because we saw errors on the clients. That is
to say, the above bug seems to mean that split-brain was reported in
error with no other impacts. However, in my case, the error resulted in
actual problems accessing the files on the NFS clients.

> Side note: Why are you using replica 9 for the ctdb volume? All
> development/tests are usually done on (distributed) replica 3 setup.

I am happy to change this. Whatever guide I used to set this up
suggested replica 9. I don't even know which resource was incorrect as
it was so long ago. I have no other reason.

I'm filing an incident now to change our setup tools to use replica-3 for
CTDB for new setups.

Again, I appreciate that you followed up with me. Thank you,

Erik
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users