Re: File operation failure on simple distributed volume

Mohammed Rafi K C <rkavunga@xxxxxxxxxx> · Tue, 14 Feb 2017 13:49:46 +0530



    Hi Yonex,
    Are you still hitting this issue ?
    

    Regards
    Rafi KC

    
    On 01/16/2017 10:36 AM, yonex wrote:

    
      Hi

        
        I noticed that there is a high throughput degradation while
        attaching the gdb script to a glusterfs client process. Write
        speed becomes 2% or less. It is not be able to keep thrown in
        production.

        
        Could you provide the custom build that you mentioned before? I
        am going to keep trying to reproduce the problem outside of the
        production environment.

        
        Regards

      
        2017年1月8日 21:54、Mohammed Rafi K C <rkavunga@xxxxxxxxxx>:

        
            Is
                there any update on this ?

                
                Regards

                
                Rafi KC

                
                On 12/24/2016 03:53 PM, yonex wrote:

                Rafi,
                

                Thanks
                  again. I will try that and get back to you.
                

                Regards.
                

                2016-12-23
                  18:03 GMT+09:00 Mohammed Rafi K C
                  <rkavunga@xxxxxxxxxx>:
                
                  Hi
                    Yonex,
                
                
                  As
                    we discussed in irc #gluster-devel , I have attached
                    the gdb script
                
                
                  along
                    with this mail.
                
                
                  Procedure
                    to run the gdb script.
                
                
                  1)
                    Install gdb,
                
                
                  2)
                    Download and install gluster debuginfo for your
                    machine . packages
                
                
                  location
                    --- > https://cbs.centos.org/koji/buildinfo?buildID=12757
                
                
                  3)
                    find the process id and attach gdb to the process
                    using the command
                
                
                  gdb
                    attach <pid> -x <path_to_script>
                
                
                  4)
                    Continue running the script till you hit the problem
                
                
                  5)
                    Stop the gdb
                
                
                  6)
                    You will see a file called mylog.txt in the location
                    where you ran
                
                
                  the
                    gdb
                
                
                  Please
                    keep an eye on the attached process. If you have any
                    doubt please
                
                
                  feel
                    free to revert me.
                
                
                  Regards
                
                
                  Rafi
                    KC
                
                
                  On
                    12/19/2016 05:33 PM, Mohammed Rafi K C wrote:
                
                
                    On
                      12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
                  
                
                      Client
                        0-glusterfs01-client-2 has disconnected from
                        bricks around
                    
                  
                      2016-12-15
                        11:21:17.854249 . Can you look and/or paste the
                        brick logs
                    
                  
                      around
                        the time.
                    
                  
                    You
                      can find the brick name and hostname for
                      0-glusterfs01-client-2 from
                  
                
                    client
                      graph.
                  
                
                    Rafi
                  
                
                      Are
                        you there in any of gluster irc channel, if so
                        Have you got a
                    
                  
                      nickname
                        that I can search.
                    
                  
                      Regards
                    
                  
                      Rafi
                        KC
                    
                  
                      On
                        12/19/2016 04:28 PM, yonex wrote:
                    
                  
                        Rafi,
                      
                    
                        OK. Thanks for your
                          guide. I found the debug log and pasted lines
                          around that.
                      
                    
                        http://pastebin.com/vhHR6PQN
                      
                    
                        Regards
                      
                    
                        2016-12-19 14:58
                          GMT+09:00 Mohammed Rafi K C
                          <rkavunga@xxxxxxxxxx>:
                      
                    
                          On 12/16/2016
                            09:10 PM, yonex wrote:
                        
                      
                            Rafi,
                          
                        
                            Thanks, the
                              .meta feature I didn't know is very nice.
                              I finally have
                          
                        
                            captured debug
                              logs from a client and bricks.
                          
                        
                            A mount log:
                          
                        
                            - http://pastebin.com/Tjy7wGGj
                          
                        
                            FYI rickdom126
                              is my client's hostname.
                          
                        
                            Brick logs
                              around that time:
                          
                        
                            - Brick1: http://pastebin.com/qzbVRSF3
                          
                        
                            - Brick2: http://pastebin.com/j3yMNhP3
                          
                        
                            - Brick3: http://pastebin.com/m81mVj6L
                          
                        
                            - Brick4: http://pastebin.com/JDAbChf6
                          
                        
                            - Brick5: http://pastebin.com/7saP6rsm
                          
                        
                            However I could
                              not find any message like "EOF on socket".
                              I hope
                          
                        
                            there is any
                              helpful information in the logs above.
                          
                        
                          Indeed. I
                            understand that the connections are in
                            disconnected state. But
                        
                      
                          what particularly
                            I'm looking for is the cause of the
                            disconnect, Can
                        
                      
                          you paste the
                            debug logs when it start disconnects, and
                            around that. You
                        
                      
                          may see a debug
                            logs that says "disconnecting now".
                        
                      
                          Regards
                        
                      
                          Rafi KC
                        
                      
                            Regards.
                          
                        
                            2016-12-14 15:20
                              GMT+09:00 Mohammed Rafi K C
                              <rkavunga@xxxxxxxxxx>:
                          
                        
                              On 12/13/2016
                                09:56 PM, yonex wrote:
                            
                          
                                Hi Rafi,
                              
                            
                                Thanks for
                                  your response. OK, I think it is
                                  possible to capture debug
                              
                            
                                logs, since
                                  the error seems to be reproduced a few
                                  times per day. I
                              
                            
                                will try
                                  that. However, so I want to avoid
                                  redundant debug outputs if
                              
                            
                                possible, is
                                  there a way to enable debug log only
                                  on specific client
                              
                            
                                nodes?
                              
                            
                              if you are
                                using fuse mount, there is proc kind of
                                feature called .meta
                            
                          
                              . You can set
                                log level through that for a particular
                                client [1] . But I
                            
                          
                              also want log
                                from bricks because I suspect bricks
                                process for
                            
                          
                              initiating the
                                disconnects.
                            
                          
                              [1] eg : echo
                                8 >
                                /mnt/glusterfs/.meta/logging/loglevel
                            
                          
                                Regards
                              
                            
                                Yonex
                              
                            
                                2016-12-13
                                  23:33 GMT+09:00 Mohammed Rafi K C
                                  <rkavunga@xxxxxxxxxx>:
                              
                            
                                  Hi Yonex,
                                
                              
                                  Is this
                                    consistently reproducible ? if so,
                                    Can you enable debug log [1]
                                
                              
                                  and check
                                    for any message similar to [2].
                                    Basically you can even search
                                
                              
                                  for "EOF
                                    on socket".
                                
                              
                                  You can
                                    set your log level back to default
                                    (INFO) after capturing for
                                
                              
                                  some time.
                                
                              
                                  [1] :
                                    gluster volume set <volname>
                                    diagnostics.brick-log-level DEBUG
                                    and
                                
                              
                                  gluster
                                    volume set <volname>
                                    diagnostics.client-log-level DEBUG
                                
                              
                                  [2] : http://pastebin.com/xn8QHXWa
                                
                              
                                  Regards
                                
                              
                                  Rafi KC
                                
                              
                                  On
                                    12/12/2016 09:35 PM, yonex wrote:
                                
                              
                                    Hi,
                                  
                                
                                    When my
                                      application moves a file from it's
                                      local disk to FUSE-mounted
                                  
                                
                                    GlusterFS
                                      volume, the client outputs many
                                      warnings and errors not
                                  
                                
                                    always
                                      but occasionally. The volume is a
                                      simple distributed volume.
                                  
                                
                                    A sample
                                      of logs pasted: http://pastebin.com/axkTCRJX
                                  
                                
                                    It seems
                                      to come from something like a
                                      network disconnection
                                  
                                
                                    ("Transport
                                      endpoint is not connected") at a
                                      glance, but other
                                  
                                
                                    networking
                                      applications on the same machine
                                      don't observe such a
                                  
                                
                                    thing.
                                      So I guess there may be a problem
                                      somewhere in GlusterFS stack.
                                  
                                
                                    It ended
                                      in failing to rename a file,
                                      logging PHP Warning like below:
                                  
                                
                                    PHP
                                      Warning:
                                      rename(/glusterfs01/db1/stack/f0/13a9a2f0):
                                      failed
                                  
                                
                                    to open
                                      stream: Input/output error in
                                      [snipped].php on line 278
                                  
                                
                                    PHP
                                      Warning:
                                  
                                
                                    rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
                                  
                                
                                    Input/output
                                      error in [snipped].php on line 278
                                  
                                
                                    Conditions:
                                  
                                
                                    -
                                      GlusterFS 3.8.5 installed via yum
                                      CentOS-Gluster-3.8.repo
                                  
                                
                                    - Volume
                                      info and status pasted: http://pastebin.com/JPt2KeD8
                                  
                                
                                    - Client
                                      machines' OS: Scientific Linux 6
                                      or CentOS 6.
                                  
                                
                                    - Server
                                      machines' OS: CentOS 6.
                                  
                                
                                    - Kernel
                                      version is
                                      2.6.32-642.6.2.el6.x86_64 on all
                                      machines.
                                  
                                
                                    - The
                                      number of connected FUSE clients
                                      is 260.
                                  
                                
                                    - No
                                      firewall between connected
                                      machines.
                                  
                                
                                    -
                                      Neither remounting volumes nor
                                      rebooting client machines take
                                      effect.
                                  
                                
                                    - It is
                                      caused by not only rename() but
                                      also copy() and filesize()
                                      operation.
                                  
                                
                                    - No
                                      outputs in brick logs when it
                                      happens.
                                  
                                
                                    Any
                                      ideas? I'd appreciate any help.
                                  
                                
                                    Regards.
                                  
                                
                                    _______________________________________________
                                  
                                
                                    Gluster-users
                                      mailing list
                                  
                                
                                    Gluster-users@xxxxxxxxxxx
                                  
                                
                                    http://www.gluster.org/mailman/listinfo/gluster-users
                                  
                                
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users