Hi Yonex Recently Poornima has fixed one corruption issue with upcall, which seems unlikely the cause of the issue, given that you are running fuse clients. Even then I would like to give you a debug build including the fix [1] and adding additional logs. Will you be able to run the debug build ? [1] : https://review.gluster.org/#/c/16613/ Regards Rafi KC On 02/16/2017 09:13 PM, yonex wrote: > Hi Rafi, > > I'm still on this issue. But reproduction has not yet been achieved > outside of production. In production environment, I have made > applications stop writing data to glusterfs volume. Only read > operations are going. > > P.S. It seems that I have corrupted the email thread..;-( > http://lists.gluster.org/pipermail/gluster-users/2017-January/029679.html > > 2017-02-14 17:19 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >> Hi Yonex, >> >> Are you still hitting this issue ? >> >> >> Regards >> >> Rafi KC >> >> >> On 01/16/2017 10:36 AM, yonex wrote: >> >> Hi >> >> I noticed that there is a high throughput degradation while attaching the >> gdb script to a glusterfs client process. Write speed becomes 2% or less. It >> is not be able to keep thrown in production. >> >> Could you provide the custom build that you mentioned before? I am going to >> keep trying to reproduce the problem outside of the production environment. >> >> Regards >> >> 2017年1月8日 21:54、Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >> >> Is there any update on this ? >> >> >> Regards >> >> Rafi KC >> >> On 12/24/2016 03:53 PM, yonex wrote: >> >> Rafi, >> >> >> Thanks again. I will try that and get back to you. >> >> >> Regards. >> >> >> >> 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >> >> Hi Yonex, >> >> >> As we discussed in irc #gluster-devel , I have attached the gdb script >> >> along with this mail. >> >> >> Procedure to run the gdb script. >> >> >> 1) Install gdb, >> >> >> 2) Download and install gluster debuginfo for your machine . packages >> >> location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757 >> >> >> 3) find the process id and attach gdb to the process using the command >> >> gdb attach <pid> -x <path_to_script> >> >> >> 4) Continue running the script till you hit the problem >> >> >> 5) Stop the gdb >> >> >> 6) You will see a file called mylog.txt in the location where you ran >> >> the gdb >> >> >> >> Please keep an eye on the attached process. If you have any doubt please >> >> feel free to revert me. >> >> >> Regards >> >> >> Rafi KC >> >> >> >> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote: >> >> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote: >> >> Client 0-glusterfs01-client-2 has disconnected from bricks around >> >> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs >> >> around the time. >> >> You can find the brick name and hostname for 0-glusterfs01-client-2 from >> >> client graph. >> >> >> Rafi >> >> >> Are you there in any of gluster irc channel, if so Have you got a >> >> nickname that I can search. >> >> >> Regards >> >> Rafi KC >> >> >> On 12/19/2016 04:28 PM, yonex wrote: >> >> Rafi, >> >> >> OK. Thanks for your guide. I found the debug log and pasted lines around >> that. >> >> http://pastebin.com/vhHR6PQN >> >> >> Regards >> >> >> >> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >> >> On 12/16/2016 09:10 PM, yonex wrote: >> >> Rafi, >> >> >> Thanks, the .meta feature I didn't know is very nice. I finally have >> >> captured debug logs from a client and bricks. >> >> >> A mount log: >> >> - http://pastebin.com/Tjy7wGGj >> >> >> FYI rickdom126 is my client's hostname. >> >> >> Brick logs around that time: >> >> - Brick1: http://pastebin.com/qzbVRSF3 >> >> - Brick2: http://pastebin.com/j3yMNhP3 >> >> - Brick3: http://pastebin.com/m81mVj6L >> >> - Brick4: http://pastebin.com/JDAbChf6 >> >> - Brick5: http://pastebin.com/7saP6rsm >> >> >> However I could not find any message like "EOF on socket". I hope >> >> there is any helpful information in the logs above. >> >> Indeed. I understand that the connections are in disconnected state. But >> >> what particularly I'm looking for is the cause of the disconnect, Can >> >> you paste the debug logs when it start disconnects, and around that. You >> >> may see a debug logs that says "disconnecting now". >> >> >> >> Regards >> >> Rafi KC >> >> >> >> Regards. >> >> >> >> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >> >> On 12/13/2016 09:56 PM, yonex wrote: >> >> Hi Rafi, >> >> >> Thanks for your response. OK, I think it is possible to capture debug >> >> logs, since the error seems to be reproduced a few times per day. I >> >> will try that. However, so I want to avoid redundant debug outputs if >> >> possible, is there a way to enable debug log only on specific client >> >> nodes? >> >> if you are using fuse mount, there is proc kind of feature called .meta >> >> . You can set log level through that for a particular client [1] . But I >> >> also want log from bricks because I suspect bricks process for >> >> initiating the disconnects. >> >> >> >> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel >> >> >> Regards >> >> >> Yonex >> >> >> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >> >> Hi Yonex, >> >> >> Is this consistently reproducible ? if so, Can you enable debug log [1] >> >> and check for any message similar to [2]. Basically you can even search >> >> for "EOF on socket". >> >> >> You can set your log level back to default (INFO) after capturing for >> >> some time. >> >> >> >> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >> >> gluster volume set <volname> diagnostics.client-log-level DEBUG >> >> >> [2] : http://pastebin.com/xn8QHXWa >> >> >> >> Regards >> >> >> Rafi KC >> >> >> On 12/12/2016 09:35 PM, yonex wrote: >> >> Hi, >> >> >> When my application moves a file from it's local disk to FUSE-mounted >> >> GlusterFS volume, the client outputs many warnings and errors not >> >> always but occasionally. The volume is a simple distributed volume. >> >> >> A sample of logs pasted: http://pastebin.com/axkTCRJX >> >> >> It seems to come from something like a network disconnection >> >> ("Transport endpoint is not connected") at a glance, but other >> >> networking applications on the same machine don't observe such a >> >> thing. So I guess there may be a problem somewhere in GlusterFS stack. >> >> >> It ended in failing to rename a file, logging PHP Warning like below: >> >> >> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >> >> to open stream: Input/output error in [snipped].php on line 278 >> >> PHP Warning: >> >> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >> >> Input/output error in [snipped].php on line 278 >> >> >> Conditions: >> >> >> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >> >> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >> >> - Client machines' OS: Scientific Linux 6 or CentOS 6. >> >> - Server machines' OS: CentOS 6. >> >> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >> >> - The number of connected FUSE clients is 260. >> >> - No firewall between connected machines. >> >> - Neither remounting volumes nor rebooting client machines take effect. >> >> - It is caused by not only rename() but also copy() and filesize() >> operation. >> >> - No outputs in brick logs when it happens. >> >> >> Any ideas? I'd appreciate any help. >> >> >> Regards. >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users@xxxxxxxxxxx >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users