Rafi, OK. Thanks for your guide. I found the debug log and pasted lines around that. http://pastebin.com/vhHR6PQN Regards 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: > > > On 12/16/2016 09:10 PM, yonex wrote: >> Rafi, >> >> Thanks, the .meta feature I didn't know is very nice. I finally have >> captured debug logs from a client and bricks. >> >> A mount log: >> - http://pastebin.com/Tjy7wGGj >> >> FYI rickdom126 is my client's hostname. >> >> Brick logs around that time: >> - Brick1: http://pastebin.com/qzbVRSF3 >> - Brick2: http://pastebin.com/j3yMNhP3 >> - Brick3: http://pastebin.com/m81mVj6L >> - Brick4: http://pastebin.com/JDAbChf6 >> - Brick5: http://pastebin.com/7saP6rsm >> >> However I could not find any message like "EOF on socket". I hope >> there is any helpful information in the logs above. > > Indeed. I understand that the connections are in disconnected state. But > what particularly I'm looking for is the cause of the disconnect, Can > you paste the debug logs when it start disconnects, and around that. You > may see a debug logs that says "disconnecting now". > > > Regards > Rafi KC > > >> >> Regards. >> >> >> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>> >>> On 12/13/2016 09:56 PM, yonex wrote: >>>> Hi Rafi, >>>> >>>> Thanks for your response. OK, I think it is possible to capture debug >>>> logs, since the error seems to be reproduced a few times per day. I >>>> will try that. However, so I want to avoid redundant debug outputs if >>>> possible, is there a way to enable debug log only on specific client >>>> nodes? >>> if you are using fuse mount, there is proc kind of feature called .meta >>> . You can set log level through that for a particular client [1] . But I >>> also want log from bricks because I suspect bricks process for >>> initiating the disconnects. >>> >>> >>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel >>> >>>> Regards >>>> >>>> Yonex >>>> >>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>>>> Hi Yonex, >>>>> >>>>> Is this consistently reproducible ? if so, Can you enable debug log [1] >>>>> and check for any message similar to [2]. Basically you can even search >>>>> for "EOF on socket". >>>>> >>>>> You can set your log level back to default (INFO) after capturing for >>>>> some time. >>>>> >>>>> >>>>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >>>>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>>>> >>>>> [2] : http://pastebin.com/xn8QHXWa >>>>> >>>>> >>>>> Regards >>>>> >>>>> Rafi KC >>>>> >>>>> On 12/12/2016 09:35 PM, yonex wrote: >>>>>> Hi, >>>>>> >>>>>> When my application moves a file from it's local disk to FUSE-mounted >>>>>> GlusterFS volume, the client outputs many warnings and errors not >>>>>> always but occasionally. The volume is a simple distributed volume. >>>>>> >>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>>>>> >>>>>> It seems to come from something like a network disconnection >>>>>> ("Transport endpoint is not connected") at a glance, but other >>>>>> networking applications on the same machine don't observe such a >>>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>>>>> >>>>>> It ended in failing to rename a file, logging PHP Warning like below: >>>>>> >>>>>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>>>>> to open stream: Input/output error in [snipped].php on line 278 >>>>>> PHP Warning: >>>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>>>>> Input/output error in [snipped].php on line 278 >>>>>> >>>>>> Conditions: >>>>>> >>>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>>>>> - Server machines' OS: CentOS 6. >>>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>>>>> - The number of connected FUSE clients is 260. >>>>>> - No firewall between connected machines. >>>>>> - Neither remounting volumes nor rebooting client machines take effect. >>>>>> - It is caused by not only rename() but also copy() and filesize() operation. >>>>>> - No outputs in brick logs when it happens. >>>>>> >>>>>> Any ideas? I'd appreciate any help. >>>>>> >>>>>> Regards. >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users@xxxxxxxxxxx >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users