Is there any update on this ? Regards Rafi KC On 12/24/2016 03:53 PM, yonex wrote: > Rafi, > > Thanks again. I will try that and get back to you. > > Regards. > > > 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >> Hi Yonex, >> >> As we discussed in irc #gluster-devel , I have attached the gdb script >> along with this mail. >> >> Procedure to run the gdb script. >> >> 1) Install gdb, >> >> 2) Download and install gluster debuginfo for your machine . packages >> location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757 >> >> 3) find the process id and attach gdb to the process using the command >> gdb attach <pid> -x <path_to_script> >> >> 4) Continue running the script till you hit the problem >> >> 5) Stop the gdb >> >> 6) You will see a file called mylog.txt in the location where you ran >> the gdb >> >> >> Please keep an eye on the attached process. If you have any doubt please >> feel free to revert me. >> >> Regards >> >> Rafi KC >> >> >> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote: >>> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote: >>>> Client 0-glusterfs01-client-2 has disconnected from bricks around >>>> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs >>>> around the time. >>> You can find the brick name and hostname for 0-glusterfs01-client-2 from >>> client graph. >>> >>> Rafi >>> >>>> Are you there in any of gluster irc channel, if so Have you got a >>>> nickname that I can search. >>>> >>>> Regards >>>> Rafi KC >>>> >>>> On 12/19/2016 04:28 PM, yonex wrote: >>>>> Rafi, >>>>> >>>>> OK. Thanks for your guide. I found the debug log and pasted lines around that. >>>>> http://pastebin.com/vhHR6PQN >>>>> >>>>> Regards >>>>> >>>>> >>>>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>>>>> On 12/16/2016 09:10 PM, yonex wrote: >>>>>>> Rafi, >>>>>>> >>>>>>> Thanks, the .meta feature I didn't know is very nice. I finally have >>>>>>> captured debug logs from a client and bricks. >>>>>>> >>>>>>> A mount log: >>>>>>> - http://pastebin.com/Tjy7wGGj >>>>>>> >>>>>>> FYI rickdom126 is my client's hostname. >>>>>>> >>>>>>> Brick logs around that time: >>>>>>> - Brick1: http://pastebin.com/qzbVRSF3 >>>>>>> - Brick2: http://pastebin.com/j3yMNhP3 >>>>>>> - Brick3: http://pastebin.com/m81mVj6L >>>>>>> - Brick4: http://pastebin.com/JDAbChf6 >>>>>>> - Brick5: http://pastebin.com/7saP6rsm >>>>>>> >>>>>>> However I could not find any message like "EOF on socket". I hope >>>>>>> there is any helpful information in the logs above. >>>>>> Indeed. I understand that the connections are in disconnected state. But >>>>>> what particularly I'm looking for is the cause of the disconnect, Can >>>>>> you paste the debug logs when it start disconnects, and around that. You >>>>>> may see a debug logs that says "disconnecting now". >>>>>> >>>>>> >>>>>> Regards >>>>>> Rafi KC >>>>>> >>>>>> >>>>>>> Regards. >>>>>>> >>>>>>> >>>>>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>>>>>>> On 12/13/2016 09:56 PM, yonex wrote: >>>>>>>>> Hi Rafi, >>>>>>>>> >>>>>>>>> Thanks for your response. OK, I think it is possible to capture debug >>>>>>>>> logs, since the error seems to be reproduced a few times per day. I >>>>>>>>> will try that. However, so I want to avoid redundant debug outputs if >>>>>>>>> possible, is there a way to enable debug log only on specific client >>>>>>>>> nodes? >>>>>>>> if you are using fuse mount, there is proc kind of feature called .meta >>>>>>>> . You can set log level through that for a particular client [1] . But I >>>>>>>> also want log from bricks because I suspect bricks process for >>>>>>>> initiating the disconnects. >>>>>>>> >>>>>>>> >>>>>>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel >>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> Yonex >>>>>>>>> >>>>>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>>>>>>>>> Hi Yonex, >>>>>>>>>> >>>>>>>>>> Is this consistently reproducible ? if so, Can you enable debug log [1] >>>>>>>>>> and check for any message similar to [2]. Basically you can even search >>>>>>>>>> for "EOF on socket". >>>>>>>>>> >>>>>>>>>> You can set your log level back to default (INFO) after capturing for >>>>>>>>>> some time. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >>>>>>>>>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>>>>>>>>> >>>>>>>>>> [2] : http://pastebin.com/xn8QHXWa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> Rafi KC >>>>>>>>>> >>>>>>>>>> On 12/12/2016 09:35 PM, yonex wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> When my application moves a file from it's local disk to FUSE-mounted >>>>>>>>>>> GlusterFS volume, the client outputs many warnings and errors not >>>>>>>>>>> always but occasionally. The volume is a simple distributed volume. >>>>>>>>>>> >>>>>>>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>>>>>>>>>> >>>>>>>>>>> It seems to come from something like a network disconnection >>>>>>>>>>> ("Transport endpoint is not connected") at a glance, but other >>>>>>>>>>> networking applications on the same machine don't observe such a >>>>>>>>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>>>>>>>>>> >>>>>>>>>>> It ended in failing to rename a file, logging PHP Warning like below: >>>>>>>>>>> >>>>>>>>>>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>>>>>>>>>> to open stream: Input/output error in [snipped].php on line 278 >>>>>>>>>>> PHP Warning: >>>>>>>>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>>>>>>>>>> Input/output error in [snipped].php on line 278 >>>>>>>>>>> >>>>>>>>>>> Conditions: >>>>>>>>>>> >>>>>>>>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>>>>>>>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>>>>>>>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>>>>>>>>>> - Server machines' OS: CentOS 6. >>>>>>>>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>>>>>>>>>> - The number of connected FUSE clients is 260. >>>>>>>>>>> - No firewall between connected machines. >>>>>>>>>>> - Neither remounting volumes nor rebooting client machines take effect. >>>>>>>>>>> - It is caused by not only rename() but also copy() and filesize() operation. >>>>>>>>>>> - No outputs in brick logs when it happens. >>>>>>>>>>> >>>>>>>>>>> Any ideas? I'd appreciate any help. >>>>>>>>>>> >>>>>>>>>>> Regards. >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>> Gluster-users@xxxxxxxxxxx >>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users