Blocked inodes locking on some files

Florian Philippon <florian.philippon@xxxxxxxxx> · Mon, 20 Jun 2016 10:43:25 +0200



    Hello guys,
    I would like to get some advices on a some problems we have on
      our 3 hosts gluster setup.
    Here the setup used:
    
      GlusterFS 3.8.0-1 (we did an upgrade from 3.7.11 last week)
      Type: Disperse
      Number of Bricks: 1 x (2 + 1) = 3
      Transport-type: tcp
      Options Reconfigured: transport.address-family: inet
    
    Please note that we also have the ACL option enabled on the
      volume mount.

    
    Use case:

    
    An user submit jobs/tasks to a Spark cluster which have the
      glusterfs volume mounted on each host.
    13 tasks were successfully completed in ~30 min for each  (convert
    some logs to a json format and write the ouput to the gluster fs)
    but one was blocked for more than 12 hours when we checked

    was going wrong.

    
    We found some log entries related to an inode locking in the brick
    log one one host:

    
    [2016-06-19 03:15:08.563397] E [inodelk.c:304:__inode_unlock_lock]
    0-exp-locks:  Matching lock not found for unlock
    0-9223372036854775807, by 10613ebc6c6a0000 on 0x6cee5c0f4730

    [2016-06-19 03:15:08.563684] E [MSGID: 115053]
    [server-rpc-fops.c:273:server_inodelk_cbk] 0-exp-server: 5375861:
    INODELK /spark/user/20160328/_temporary/0/_temporary (015bde3a-09d

    6-41a2-8e9f-7e7c5295d596) ==> (Invalid argument) [Invalid
    argument]

    
    Errors in the data log:

    [2016-06-19 03:13:29.198676] I [MSGID: 109036]
      [dht-common.c:8824:dht_log_new_layout_for_dir_selfheal] 0-exp-dht:
      Setting layout of /spark/user/20160328/_temporary/0/_temporary/at

      tempt_201606190511_0004_m_000004_26 with [Subvol_name:
      exp-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
      

      [2016-06-19 03:14:59.349357] I [MSGID: 109066]
      [dht-rename.c:1562:dht_rename] 0-exp-dht: renaming
/spark/user/20160328/_temporary/0/_temporary/attempt_201606190511_0004_m_000001_2

      3 (hash=exp-disperse-0/cache=exp-disperse-0) =>
      /spark/user/20160328/_temporary/0/task_201606190511_0004_m_000001
      (hash=exp-disperse-0/cache=<nul>)
    And these entries are also spamming the data log when an action
      is done the fs:

    
    [2016-06-19 13:58:22.817308] I [dict.c:462:dict_get]
      (-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628)
      [0x6f0655cd1628]
      -->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s

      o(+0x9ccb) [0x6f0655ab5ccb]
      -->/lib64/libglusterfs.so.0(dict_get+0xec) [0x6f066528df7c] )
      0-dict: !this || key=system.posix_acl_access [Invalid argument]

      [2016-06-19 13:58:22.817364] I [dict.c:462:dict_get]
      (-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628)
      [0x6f0655cd1628]
      -->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s

      o(+0x9d21) [0x6f0655ab5d21]
      -->/lib64/libglusterfs.so.0(dict_get+0xec) [0x6f066528df7c] )
      0-dict: !this || key=system.posix_acl_default [Invalid argument]
    We did a stadump and we got confirmation that some processes were
      in a blocking state.
    We did a clear lock on the blocked inode and the spark job has
      finally finished (with errors).
    What could be the root cause of these lockings?

    
    Thanks for your help!
    Florian

    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users