Hi Rafi Sorry for the late. Though I eventually could not have reproduced the problem out of the production environment, I will be able to run the debug build as part of the production if it does not occur a performance issue. I would like you to give me a guide about the debug build. By the way, before that, as it would be helpful to update glusterfs from 3.8.5 to 3.8.9, I am going to do this. Regards Yonex 2017-02-17 15:03 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: > Hi Yonex > > Recently Poornima has fixed one corruption issue with upcall, which > seems unlikely the cause of the issue, given that you are running fuse > clients. Even then I would like to give you a debug build including the > fix [1] and adding additional logs. > > Will you be able to run the debug build ? > > > [1] : https://review.gluster.org/#/c/16613/ > > Regards > > Rafi KC > > > On 02/16/2017 09:13 PM, yonex wrote: >> Hi Rafi, >> >> I'm still on this issue. But reproduction has not yet been achieved >> outside of production. In production environment, I have made >> applications stop writing data to glusterfs volume. Only read >> operations are going. >> >> P.S. It seems that I have corrupted the email thread..;-( >> http://lists.gluster.org/pipermail/gluster-users/2017-January/029679.html >> >> 2017-02-14 17:19 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>> Hi Yonex, >>> >>> Are you still hitting this issue ? >>> >>> >>> Regards >>> >>> Rafi KC >>> >>> >>> On 01/16/2017 10:36 AM, yonex wrote: >>> >>> Hi >>> >>> I noticed that there is a high throughput degradation while attaching the >>> gdb script to a glusterfs client process. Write speed becomes 2% or less. It >>> is not be able to keep thrown in production. >>> >>> Could you provide the custom build that you mentioned before? I am going to >>> keep trying to reproduce the problem outside of the production environment. >>> >>> Regards >>> >>> 2017年1月8日 21:54、Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>> >>> Is there any update on this ? >>> >>> >>> Regards >>> >>> Rafi KC >>> >>> On 12/24/2016 03:53 PM, yonex wrote: >>> >>> Rafi, >>> >>> >>> Thanks again. I will try that and get back to you. >>> >>> >>> Regards. >>> >>> >>> >>> 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>> >>> Hi Yonex, >>> >>> >>> As we discussed in irc #gluster-devel , I have attached the gdb script >>> >>> along with this mail. >>> >>> >>> Procedure to run the gdb script. >>> >>> >>> 1) Install gdb, >>> >>> >>> 2) Download and install gluster debuginfo for your machine . packages >>> >>> location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757 >>> >>> >>> 3) find the process id and attach gdb to the process using the command >>> >>> gdb attach <pid> -x <path_to_script> >>> >>> >>> 4) Continue running the script till you hit the problem >>> >>> >>> 5) Stop the gdb >>> >>> >>> 6) You will see a file called mylog.txt in the location where you ran >>> >>> the gdb >>> >>> >>> >>> Please keep an eye on the attached process. If you have any doubt please >>> >>> feel free to revert me. >>> >>> >>> Regards >>> >>> >>> Rafi KC >>> >>> >>> >>> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote: >>> >>> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote: >>> >>> Client 0-glusterfs01-client-2 has disconnected from bricks around >>> >>> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs >>> >>> around the time. >>> >>> You can find the brick name and hostname for 0-glusterfs01-client-2 from >>> >>> client graph. >>> >>> >>> Rafi >>> >>> >>> Are you there in any of gluster irc channel, if so Have you got a >>> >>> nickname that I can search. >>> >>> >>> Regards >>> >>> Rafi KC >>> >>> >>> On 12/19/2016 04:28 PM, yonex wrote: >>> >>> Rafi, >>> >>> >>> OK. Thanks for your guide. I found the debug log and pasted lines around >>> that. >>> >>> http://pastebin.com/vhHR6PQN >>> >>> >>> Regards >>> >>> >>> >>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>> >>> On 12/16/2016 09:10 PM, yonex wrote: >>> >>> Rafi, >>> >>> >>> Thanks, the .meta feature I didn't know is very nice. I finally have >>> >>> captured debug logs from a client and bricks. >>> >>> >>> A mount log: >>> >>> - http://pastebin.com/Tjy7wGGj >>> >>> >>> FYI rickdom126 is my client's hostname. >>> >>> >>> Brick logs around that time: >>> >>> - Brick1: http://pastebin.com/qzbVRSF3 >>> >>> - Brick2: http://pastebin.com/j3yMNhP3 >>> >>> - Brick3: http://pastebin.com/m81mVj6L >>> >>> - Brick4: http://pastebin.com/JDAbChf6 >>> >>> - Brick5: http://pastebin.com/7saP6rsm >>> >>> >>> However I could not find any message like "EOF on socket". I hope >>> >>> there is any helpful information in the logs above. >>> >>> Indeed. I understand that the connections are in disconnected state. But >>> >>> what particularly I'm looking for is the cause of the disconnect, Can >>> >>> you paste the debug logs when it start disconnects, and around that. You >>> >>> may see a debug logs that says "disconnecting now". >>> >>> >>> >>> Regards >>> >>> Rafi KC >>> >>> >>> >>> Regards. >>> >>> >>> >>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>> >>> On 12/13/2016 09:56 PM, yonex wrote: >>> >>> Hi Rafi, >>> >>> >>> Thanks for your response. OK, I think it is possible to capture debug >>> >>> logs, since the error seems to be reproduced a few times per day. I >>> >>> will try that. However, so I want to avoid redundant debug outputs if >>> >>> possible, is there a way to enable debug log only on specific client >>> >>> nodes? >>> >>> if you are using fuse mount, there is proc kind of feature called .meta >>> >>> . You can set log level through that for a particular client [1] . But I >>> >>> also want log from bricks because I suspect bricks process for >>> >>> initiating the disconnects. >>> >>> >>> >>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel >>> >>> >>> Regards >>> >>> >>> Yonex >>> >>> >>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga@xxxxxxxxxx>: >>> >>> Hi Yonex, >>> >>> >>> Is this consistently reproducible ? if so, Can you enable debug log [1] >>> >>> and check for any message similar to [2]. Basically you can even search >>> >>> for "EOF on socket". >>> >>> >>> You can set your log level back to default (INFO) after capturing for >>> >>> some time. >>> >>> >>> >>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >>> >>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>> >>> >>> [2] : http://pastebin.com/xn8QHXWa >>> >>> >>> >>> Regards >>> >>> >>> Rafi KC >>> >>> >>> On 12/12/2016 09:35 PM, yonex wrote: >>> >>> Hi, >>> >>> >>> When my application moves a file from it's local disk to FUSE-mounted >>> >>> GlusterFS volume, the client outputs many warnings and errors not >>> >>> always but occasionally. The volume is a simple distributed volume. >>> >>> >>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>> >>> >>> It seems to come from something like a network disconnection >>> >>> ("Transport endpoint is not connected") at a glance, but other >>> >>> networking applications on the same machine don't observe such a >>> >>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>> >>> >>> It ended in failing to rename a file, logging PHP Warning like below: >>> >>> >>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>> >>> to open stream: Input/output error in [snipped].php on line 278 >>> >>> PHP Warning: >>> >>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>> >>> Input/output error in [snipped].php on line 278 >>> >>> >>> Conditions: >>> >>> >>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>> >>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>> >>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>> >>> - Server machines' OS: CentOS 6. >>> >>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>> >>> - The number of connected FUSE clients is 260. >>> >>> - No firewall between connected machines. >>> >>> - Neither remounting volumes nor rebooting client machines take effect. >>> >>> - It is caused by not only rename() but also copy() and filesize() >>> operation. >>> >>> - No outputs in brick logs when it happens. >>> >>> >>> Any ideas? I'd appreciate any help. >>> >>> >>> Regards. >>> >>> _______________________________________________ >>> >>> Gluster-users mailing list >>> >>> Gluster-users@xxxxxxxxxxx >>> >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users