Re: File operation failure on simple distributed volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
> Client 0-glusterfs01-client-2 has disconnected from bricks around
> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs
> around the time.
You can find the brick name and hostname for 0-glusterfs01-client-2 from
client graph.

Rafi

>
> Are you there in any of gluster irc channel, if so Have you got a
> nickname that I can search.
>
> Regards
> Rafi KC
>
> On 12/19/2016 04:28 PM, yonex wrote:
>> Rafi,
>>
>> OK. Thanks for your guide. I found the debug log and pasted lines around that.
>> http://pastebin.com/vhHR6PQN
>>
>> Regards
>>
>>
>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>:
>>> On 12/16/2016 09:10 PM, yonex wrote:
>>>> Rafi,
>>>>
>>>> Thanks, the .meta feature I didn't know is very nice. I finally have
>>>> captured debug logs from a client and bricks.
>>>>
>>>> A mount log:
>>>> - http://pastebin.com/Tjy7wGGj
>>>>
>>>> FYI rickdom126 is my client's hostname.
>>>>
>>>> Brick logs around that time:
>>>> - Brick1: http://pastebin.com/qzbVRSF3
>>>> - Brick2: http://pastebin.com/j3yMNhP3
>>>> - Brick3: http://pastebin.com/m81mVj6L
>>>> - Brick4: http://pastebin.com/JDAbChf6
>>>> - Brick5: http://pastebin.com/7saP6rsm
>>>>
>>>> However I could not find any message like "EOF on socket". I hope
>>>> there is any helpful information in the logs above.
>>> Indeed. I understand that the connections are in disconnected state. But
>>> what particularly I'm looking for is the cause of the disconnect, Can
>>> you paste the debug logs when it start disconnects, and around that. You
>>> may see a debug logs that says "disconnecting now".
>>>
>>>
>>> Regards
>>> Rafi KC
>>>
>>>
>>>> Regards.
>>>>
>>>>
>>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>:
>>>>> On 12/13/2016 09:56 PM, yonex wrote:
>>>>>> Hi Rafi,
>>>>>>
>>>>>> Thanks for your response. OK, I think it is possible to capture debug
>>>>>> logs, since the error seems to be reproduced a few times per day. I
>>>>>> will try that. However, so I want to avoid redundant debug outputs if
>>>>>> possible, is there a way to enable debug log only on specific client
>>>>>> nodes?
>>>>> if you are using fuse mount, there is proc kind of feature called .meta
>>>>> . You can set log level through that for a particular client [1] . But I
>>>>> also want log from bricks because I suspect bricks process for
>>>>> initiating the disconnects.
>>>>>
>>>>>
>>>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel
>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Yonex
>>>>>>
>>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>:
>>>>>>> Hi Yonex,
>>>>>>>
>>>>>>> Is this consistently reproducible ? if so, Can you enable debug log [1]
>>>>>>> and check for any message similar to [2]. Basically you can even search
>>>>>>> for "EOF on socket".
>>>>>>>
>>>>>>> You can set your log level back to default (INFO) after capturing for
>>>>>>> some time.
>>>>>>>
>>>>>>>
>>>>>>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and
>>>>>>> gluster volume set <volname> diagnostics.client-log-level DEBUG
>>>>>>>
>>>>>>> [2] : http://pastebin.com/xn8QHXWa
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Rafi KC
>>>>>>>
>>>>>>> On 12/12/2016 09:35 PM, yonex wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> When my application moves a file from it's local disk to FUSE-mounted
>>>>>>>> GlusterFS volume, the client outputs many warnings and errors not
>>>>>>>> always but occasionally. The volume is a simple distributed volume.
>>>>>>>>
>>>>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX
>>>>>>>>
>>>>>>>> It seems to come from something like a network disconnection
>>>>>>>> ("Transport endpoint is not connected") at a glance, but other
>>>>>>>> networking applications on the same machine don't observe such a
>>>>>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack.
>>>>>>>>
>>>>>>>> It ended in failing to rename a file, logging PHP Warning like below:
>>>>>>>>
>>>>>>>>     PHP Warning:  rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed
>>>>>>>> to open stream: Input/output error in [snipped].php on line 278
>>>>>>>>     PHP Warning:
>>>>>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
>>>>>>>> Input/output error in [snipped].php on line 278
>>>>>>>>
>>>>>>>> Conditions:
>>>>>>>>
>>>>>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo
>>>>>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8
>>>>>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6.
>>>>>>>> - Server machines' OS: CentOS 6.
>>>>>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines.
>>>>>>>> - The number of connected FUSE clients is 260.
>>>>>>>> - No firewall between connected machines.
>>>>>>>> - Neither remounting volumes nor rebooting client machines take effect.
>>>>>>>> - It is caused by not only rename() but also copy() and filesize() operation.
>>>>>>>> - No outputs in brick logs when it happens.
>>>>>>>>
>>>>>>>> Any ideas? I'd appreciate any help.
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users@xxxxxxxxxxx
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux