Re: gnfs split brain when 1 server in 3x1 down (high load) - help request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On March 30, 2020 7:54:59 PM GMT+03:00, Erik Jacobson <erik.jacobson@xxxxxxx> wrote:
>> Hi Erik,
>> Sadly I didn't have the time to take a look in your logs, but I would
>like to ask you whether you have statiatics of the network bandwidth
>usage.
>> Could it be possible that the gNFS server is  starved for bandwidth
>and fails to reach all bricks  leading to 'split-brain' errors ?
>> 
>
>I understand. I doubt there is a bandwidth issue but I'll add this to
>my
>checks. We have 288 nodes per server normally and they run fine with
>all
>servers up. The 76 number is just what we happened to have access to on
>an internal system.
>
>Question: What you mentioned above, and a feeling I have too personally
>is -- is the split-brain error actually a generic catch-all error for
>not being able to get access to a file? So when it says "split-brain"
>could it really mean any type of access error? Could it also be given
>when there is a IO timeout or something?
>
>I'm starting to break open the source code to look around but I think
>my
>head will explode before I understand it enough. I will still give it a
>shot.
>
>I have access to this system until later tonight. Then it goes away. We
>have duplicated it on another system that stays, but the machine
>internally is so contended for that I wouldn't get a time slot until
>later in the week anyway. Trying to make as much use of this "gift"
>machine as I can :) :)
>
>Thanks again for the replies so far.
>
>Erik

Hey Erik,

Sadly I am not a  developer,  so I can't answer your questions.
Still,  a  bandwith starvation looks like a possible  (at least to me) reason - although error messages and timeouts should fill the logs.

I can recommend you to increase logging for both brick & volume to the maximum and try to reproduce the issue.
Keep in mind that the logs can grow very fast.

Best Regards,
Strahil Nikolov
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux