remote operation failed erros on Glusterfs 3.7.15

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Gluster team & users, 

We are seeing multiple instances of the following error: "remote operation failed [No such file or directory]" on our gluster clients, and this has affects cases where we have some files hosted and are opened/memory-mapped

We are seeing this error after we recently added another brick to a replica 2 gluster volume (A couple of days back), making it a volume supported by three replicated bricks (we performed this operation a couple of days ago).  Any information on this error would be useful. If needed we can supply any of the client or brick logs. 

12447146-[2016-10-21 14:50:07.806214] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x284) [0x7f68a0aada94] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_default [Invalid argument]
12447579-[2016-10-21 14:50:07.837879] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x230) [0x7f68a0aada40] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_access [Invalid argument]
12448011-[2016-10-21 14:50:07.837928] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x284) [0x7f68a0aada94] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_default [Invalid argument]
12448444:[2016-10-21 14:50:10.784317] W [MSGID: 114031] [client-rpc-fops.c:3057:client3_3_readv_cbk] 6-volume1-client-1: remote operation failed [No such file or directory]
12448608:[2016-10-21 14:50:10.784757] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory]
12448772:[2016-10-21 14:50:10.784763] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory]
12448936:[2016-10-21 14:50:10.785575] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory]
12449100-[2016-10-21 14:50:10.786208] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 6-volume1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid 10495074-82d0-4961-8212-5a4f32895f37. (Possible split-brain)
12449328:[2016-10-21 14:50:10.787439] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory]
12449492-[2016-10-21 14:50:10.788730] E [MSGID: 109040] [dht-helper.c:1190:dht_migration_complete_check_task] 6-volume1-dht: (null): failed to lookup the file on volume1-dht [Stale file handle]
12449677-[2016-10-21 14:50:10.788778] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 622070230: READ => -1 gfid=10495074-82d0-4961-8212-5a4f32895f37 fd=0x7f68951a75bc (Stale file handle)
12449864:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory]" repeated 2 times between [2016-10-21 14:50:10.784763] and [2016-10-21 14:50:10.789213]
12450100:[2016-10-21 14:50:10.790080] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory]
12450264:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory]" repeated 3 times between [2016-10-21 14:50:10.784757] and [2016-10-21 14:50:10.791118]
12450500:[2016-10-21 14:50:10.791176] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory]
12450664-[2016-10-21 14:50:10.793395] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 622070238: READ => -1 gfid=10495074-82d0-4961-8212-5a4f32895f37 fd=0x7f68951a75bc (Stale file handle)
12450851-[2016-10-21 14:50:11.036804] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x230) [0x7f68a0aada40] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_access [Invalid argument]
12451283-[2016-10-21 14:50:11.036889] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x284) [0x7f68a0aada94] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_default [Invalid argument]
12451716-The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 6-volume1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid 10495074-82d0-4961-8212-5a4f32895f37. (Possible split-brain)" repeated 3 times between [2016-10-21 14:50:10.786208] and [2016-10-21 14:50:11.223498]
12452016:[2016-10-21 14:50:11.223949] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory]
12452180:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory]" repeated 2 times between [2016-10-21 14:50:10.790080] and [2016-10-21 14:50:11.224945]
12452416-[2016-10-21 14:50:11.225264] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 6-volume1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid 10495074-82d0-4961-8212-5a4f32895f37. (Possible split-brain)
12452644:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory]" repeated 2 times between [2016-10-21 14:50:10.791176] and [2016-10-21 14:50:11.225783]
12452880:[2016-10-21 14:50:11.226648] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory]
12453044-[2016-10-21 14:50:11.228115] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 622070413: READ => -1 gfid=10495074-82d0-4961-8212-5a4f32895f37 fd=0x7f68951a75bc (Stale file handle)
12453231:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory]" repeated 2 times between [2016-10-21 14:50:11.223949] and [2016-10-21 14:50:11.239505]
12453467:[2016-10-21 14:50:11.239646] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory]
12453631-The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 6-volume1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid 10495074-82d0-4961-8212-5a4f32895f37. (Possible split-brain)" repeated 2 times between [2016-10-21 14:50:11.225264] and [2016-10-21 14:50:11.241102]
12453931:[2016-10-21 14:50:11.241441] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory]
12454095-[2016-10-21 14:50:11.243704] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 622070416: READ => -1 gfid=10495074-82d0-4961-8212-5a4f32895f37 fd=0x7f68951a75bc (Stale file handle)

Below is the current volume status/configuration:
$ sudo gluster volume status
Status of volume: volume1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ip-172-25-2-91.us-west-1.compute.inte
rnal:/data/glusterfs/volume1/brick1/brick   49152     0          Y       26520
Brick ip-172-25-2-206.us-west-1.compute.int
ernal:/data/glusterfs/volume1/brick1/brick  49152     0          Y       17782
Brick ip-172-25-33-75.us-west-1.compute.int
ernal:/data/glusterfs/volume1/brick1/brick  49152     0          Y       7225
NFS Server on localhost                     2049      0          Y       7245
Self-heal Daemon on localhost               N/A       N/A        Y       7253
NFS Server on ip-172-25-2-206.us-west-1.com
pute.internal                               2049      0          Y       17436
Self-heal Daemon on ip-172-25-2-206.us-west
-1.compute.internal                         N/A       N/A        Y       17456
NFS Server on ip-172-25-2-91.us-west-1.comp
ute.internal                                2049      0          Y       10576
Self-heal Daemon on ip-172-25-2-91.us-west-
1.compute.internal                          N/A       N/A        Y       10610

Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks

$ sudo gluster volume info
Volume Name: volume1
Type: Replicate
Volume ID: 3bcca83e-2be5-410c-9a23-b159f570ee7e
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ip-172-25-2-91.us-west-1.compute.internal:/data/glusterfs/volume1/brick1/brick
Brick2: ip-172-25-2-206.us-west-1.compute.internal:/data/glusterfs/volume1/brick1/brick
Brick3: ip-172-25-33-75.us-west-1.compute.internal:/data/glusterfs/volume1/brick1/brick  <-- brick added a couple of days back
Options Reconfigured:
cluster.quorum-type: fixed
cluster.quorum-count: 2

$ From the client log: mnt-repos-volume1.log.1
  1: volume volume1-client-0
  2:     type protocol/client
  3:     option clnt-lk-version 1
  4:     option volfile-checksum 0
  5:     option volfile-key /volume1
  6:     option client-version 3.7.15
  7:     option process-uuid production-collab-8-18739-2016/10/04-20:46:19:350684-volume1-client-0-6-0
  8:     option fops-version 1298437
  9:     option ping-timeout 42
 10:     option remote-host ip-172-25-2-91.us-west-1.compute.internal
 11:     option remote-subvolume /data/glusterfs/volume1/brick1/brick
 12:     option transport-type socket
 13:     option send-gids true
 14: end-volume
 15:
 16: volume volume1-client-1
 17:     type protocol/client
 18:     option ping-timeout 42
 19:     option remote-host ip-172-25-2-206.us-west-1.compute.internal
 20:     option remote-subvolume /data/glusterfs/volume1/brick1/brick
 21:     option transport-type socket
 22:     option send-gids true
 23: end-volume
 24:
 25: volume volume1-client-2
 26:     type protocol/client
 27:     option ping-timeout 42
 28:     option remote-host ip-172-25-33-75.us-west-1.compute.internal
 29:     option remote-subvolume /data/glusterfs/volume1/brick1/brick
 30:     option transport-type socket
 31:     option send-gids true
 32: end-volume
 33:
 34: volume volume1-replicate-0
 35:     type cluster/replicate
 36:     option quorum-type fixed
 37:     option quorum-count 2
 38:     subvolumes volume1-client-0 volume1-client-1 volume1-client-2
 39: end-volume
 40:
 41: volume volume1-dht
 42:     type cluster/distribute
 43:     subvolumes volume1-replicate-0
 44: end-volume
 45:
 46: volume volume1-write-behind
 47:     type performance/write-behind
 48:     subvolumes volume1-dht
 49: end-volume
 50:
 51: volume volume1-read-ahead
 52:     type performance/read-ahead
 53:     subvolumes volume1-write-behind
 54: end-volume
 55:
 56: volume volume1-io-cache
 57:     type performance/io-cache
 58:     subvolumes volume1-read-ahead
 59: end-volume
 60:
 61: volume volume1-quick-read
 62:     type performance/quick-read
 63:     subvolumes volume1-io-cache
 64: end-volume
 65:
 66: volume volume1-open-behind
 67:     type performance/open-behind
 68:     subvolumes volume1-quick-read
 69: end-volume
 70:
 71: volume volume1-md-cache
 72:     type performance/md-cache
 73:     option cache-posix-acl true
 74:     subvolumes volume1-open-behind
 75: end-volume
 76:
 77: volume volume1
 78:     type debug/io-stats
 79:     option log-level INFO
 80:     option latency-measurement off
 81:     option count-fop-hits off
 82:     subvolumes volume1-md-cache
 83: end-volume
 84:
 85: volume posix-acl-autoload
 86:     type system/posix-acl
 87:     subvolumes volume1
 88: end-volume
 89:
 90: volume meta-autoload
 91:     type meta
 92:     subvolumes posix-acl-autoload
 93: end-volume
 94:
+------------------------------------------------------------------------------+

Thanks
Rama


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux