Re: File operation failure on simple distributed volume

yonex <yonexyonex@xxxxxxxxxx> · Mon, 19 Dec 2016 19:58:59 +0900

Rafi,

OK. Thanks for your guide. I found the debug log and pasted lines around that.
http://pastebin.com/vhHR6PQN

Regards

2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>:
>
>
> On 12/16/2016 09:10 PM, yonex wrote:
>> Rafi,
>>
>> Thanks, the .meta feature I didn't know is very nice. I finally have
>> captured debug logs from a client and bricks.
>>
>> A mount log:
>> - http://pastebin.com/Tjy7wGGj
>>
>> FYI rickdom126 is my client's hostname.
>>
>> Brick logs around that time:
>> - Brick1: http://pastebin.com/qzbVRSF3
>> - Brick2: http://pastebin.com/j3yMNhP3
>> - Brick3: http://pastebin.com/m81mVj6L
>> - Brick4: http://pastebin.com/JDAbChf6
>> - Brick5: http://pastebin.com/7saP6rsm
>>
>> However I could not find any message like "EOF on socket". I hope
>> there is any helpful information in the logs above.
>
> Indeed. I understand that the connections are in disconnected state. But
> what particularly I'm looking for is the cause of the disconnect, Can
> you paste the debug logs when it start disconnects, and around that. You
> may see a debug logs that says "disconnecting now".
>
>
> Regards
> Rafi KC
>
>
>>
>> Regards.
>>
>>
>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>:
>>>
>>> On 12/13/2016 09:56 PM, yonex wrote:
>>>> Hi Rafi,
>>>>
>>>> Thanks for your response. OK, I think it is possible to capture debug
>>>> logs, since the error seems to be reproduced a few times per day. I
>>>> will try that. However, so I want to avoid redundant debug outputs if
>>>> possible, is there a way to enable debug log only on specific client
>>>> nodes?
>>> if you are using fuse mount, there is proc kind of feature called .meta
>>> . You can set log level through that for a particular client [1] . But I
>>> also want log from bricks because I suspect bricks process for
>>> initiating the disconnects.
>>>
>>>
>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel
>>>
>>>> Regards
>>>>
>>>> Yonex
>>>>
>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>:
>>>>> Hi Yonex,
>>>>>
>>>>> Is this consistently reproducible ? if so, Can you enable debug log [1]
>>>>> and check for any message similar to [2]. Basically you can even search
>>>>> for "EOF on socket".
>>>>>
>>>>> You can set your log level back to default (INFO) after capturing for
>>>>> some time.
>>>>>
>>>>>
>>>>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and
>>>>> gluster volume set <volname> diagnostics.client-log-level DEBUG
>>>>>
>>>>> [2] : http://pastebin.com/xn8QHXWa
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> Rafi KC
>>>>>
>>>>> On 12/12/2016 09:35 PM, yonex wrote:
>>>>>> Hi,
>>>>>>
>>>>>> When my application moves a file from it's local disk to FUSE-mounted
>>>>>> GlusterFS volume, the client outputs many warnings and errors not
>>>>>> always but occasionally. The volume is a simple distributed volume.
>>>>>>
>>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX
>>>>>>
>>>>>> It seems to come from something like a network disconnection
>>>>>> ("Transport endpoint is not connected") at a glance, but other
>>>>>> networking applications on the same machine don't observe such a
>>>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack.
>>>>>>
>>>>>> It ended in failing to rename a file, logging PHP Warning like below:
>>>>>>
>>>>>>     PHP Warning:  rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed
>>>>>> to open stream: Input/output error in [snipped].php on line 278
>>>>>>     PHP Warning:
>>>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
>>>>>> Input/output error in [snipped].php on line 278
>>>>>>
>>>>>> Conditions:
>>>>>>
>>>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo
>>>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8
>>>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6.
>>>>>> - Server machines' OS: CentOS 6.
>>>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines.
>>>>>> - The number of connected FUSE clients is 260.
>>>>>> - No firewall between connected machines.
>>>>>> - Neither remounting volumes nor rebooting client machines take effect.
>>>>>> - It is caused by not only rename() but also copy() and filesize() operation.
>>>>>> - No outputs in brick logs when it happens.
>>>>>>
>>>>>> Any ideas? I'd appreciate any help.
>>>>>>
>>>>>> Regards.
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users@xxxxxxxxxxx
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users