Glusto failures with dispersed volumes + Samba

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi gluster-devel experts,

I've stumbled upon the very same issue/observations like the one mentioned in http://lists.gluster.org/pipermail/gluster-devel/2017-July/053234.html.
Nigel Babu and Pranith Kumar Karampuri told me that this selinux related issue had been also been fixed in >= v3.13.0.

The test setup is as follows:
OS: OpenSUSE Leap 42.3 (selinux features installed/enabled)
SW: gluster 3.13.1
- 3 gluster nodes (gnode[1,2,3] with own vlan for gluster communication)
- 1 virtualization server (snode) upon which the test samba server instance is running under KVM/QEMU. The image is provided by the gluster  (vmvol).
The gluster volume vol1 is for the samba share.

snode:~ # ssh gnode1 gluster vol info all
 
Volume Name: vmvol
Type: Replicate
Volume ID: a03b8fc1-4fcb-4268-bf09-0f554ba5e7a5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gs-gnode1:/data/glusterfs/vmvol/brick1/brick
Brick2: gs-gnode2:/data/glusterfs/vmvol/brick1/brick
Brick3: gs-gnode3:/data/glusterfs/vmvol/brick1/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
server.allow-insecure: on                                                                                                                                                                          
storage.owner-uid: 500                                                                                                                                                                             
storage.owner-gid: 500                                                                                                                                                                             
                                                                                                                                                                                                   
Volume Name: vol1                                                                                                                                                                                  
Type: Disperse                                                                                                                                                                                     
Volume ID: fb081b58-bffc-4ddd-bf62-a87a13abec9b                                                                                                                                                    
Status: Started                                                                                                                                                                                    
Snapshot Count: 0                                                                                                                                                                                  
Number of Bricks: 1 x (4 + 2) = 6                                                                                                                                                                  
Transport-type: tcp                                                                                                                                                                                
Bricks:                                                                                                                                                                                            
Brick1: gs-gnode1:/data/glusterfs/vol1/brick1/brick
Brick2: gs-gnode2:/data/glusterfs/vol1/brick1/brick
Brick3: gs-gnode3:/data/glusterfs/vol1/brick1/brick
Brick4: gs-gnode1:/data/glusterfs/vol1/brick2/brick
Brick5: gs-gnode2:/data/glusterfs/vol1/brick2/brick
Brick6: gs-gnode3:/data/glusterfs/vol1/brick2/brick
Options Reconfigured:
performance.readdir-ahead: on
storage.batch-fsync-delay-usec: 0
performance.stat-prefetch: off
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on

The samba server config is:
samba:~ # cat /etc/samba/smb.conf
[global]
  workgroup = TESTGROUP
  server string = Samba Server Version %v
  log file = /var/log/samba/log.%m
  log level = 2
  map to guest = Bad User
  unix charset = UTF-8
  idmap config * : backend = autorid
  idmap config * : range = 1000000-1999999

[gluster]
  kernel share modes = no                                                                                                                                                                          
  vfs objects = acl_xattr glusterfs                                                                                                                                                                
  comment = glusterfs based volume                                                                                                                                                                 
  browseable = Yes                                                                                                                                                                                 
  read _only_ = No                                                                                                                                                                                   
  writeable = Yes                                                                                                                                                                                  
  public = Yes                                                                                                                                                                                     
  guest ok = Yes                                                                                                                                                                                   
  inherit acls = Yes                                                                                                                                                                               
  path = /shares/data/                                                                                                                                                                             
  glusterfs:volume = vol1                                                                                                                                                                          
  glusterfs:volfile_server = gs-gnode1.origenis.de gs-gnode2.origenis.de gs-gnode3.origenis.de                                                                                                     
#  glusterfs:volfile_server = 172.17.20.1 172.17.20.2 172.17.20.3
  glusterfs:loglevel = 9
  glusterfs:logfile = /var/log/samba/glusterfs-vol1.%M.log


The corresponding logs on the samba server are:
* The vfs_gluster log shows lots of:
[2018-01-15 15:27:49.349995] D [logging.c:1817:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2018-01-15 15:27:49.351598] D [rpc-clnt.c:1047:rpc_clnt_connection_init] 0-gfapi: defaulting frame-timeout to 30mins
[2018-01-15 15:27:49.351625] D [rpc-clnt.c:1061:rpc_clnt_connection_init] 0-gfapi: disable ping-timeout
[2018-01-15 15:27:49.351644] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.13.1/rpc-transport/socket.so
[2018-01-15 15:27:49.352372] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-gfapi: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2018-01-15 15:27:49.352402] T [MSGID: 0] [options.c:86:xlator_option_validate_int] 0-gfapi: no range check required for 'option remote-port 24007'
[2018-01-15 15:27:49.354865] D [socket.c:4236:socket_init] 0-gfapi: Configued transport.tcp-user-timeout=0
[2018-01-15 15:27:49.354885] D [socket.c:4254:socket_init] 0-gfapi: Reconfigued transport.keepalivecnt=9
[2018-01-15 15:27:49.354901] D [socket.c:4339:socket_init] 0-gfapi: SSL support on the I/O path is NOT enabled
[2018-01-15 15:27:49.354914] D [socket.c:4342:socket_init] 0-gfapi: SSL support for glusterd is NOT enabled
[2018-01-15 15:27:49.354926] D [socket.c:4359:socket_init] 0-gfapi: using system polling thread
[2018-01-15 15:27:49.354943] D [rpc-clnt.c:1567:rpcclnt_cbk_program_register] 0-gfapi: New program registered: GlusterFS Callback, Num: 52743234, Ver: 1
[2018-01-15 15:27:49.354958] T [rpc-clnt.c:406:rpc_clnt_reconnect] 0-gfapi: attempting reconnect
[2018-01-15 15:27:49.354971] T [socket.c:3146:socket_connect] 0-gfapi: connecting 0x55e1aca9f9b0, state=0 gen=0 sock=-1
[2018-01-15 15:27:49.354991] T [MSGID: 0] [common-utils.c:306:gf_resolve_ip6] 0-resolver: DNS cache not present, freshly probing hostname: gs-gnode1.origenis.de
[2018-01-15 15:27:49.360012] D [MSGID: 0] [common-utils.c:346:gf_resolve_ip6] 0-resolver: returning ip-172.17.20.1 (port-24007) for hostname: gs-gnode1.origenis.de and port: 24007
[2018-01-15 15:27:49.360039] D [socket.c:3063:socket_fix_ssl_opts] 0-gfapi: disabling SSL for portmapper connection
[2018-01-15 15:27:49.360068] W [socket.c:3216:socket_connect] 0-gfapi: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-01-15 15:27:49.360086] T [socket.c:834:__socket_nodelay] 0-gfapi: NODELAY enabled for socket 40
[2018-01-15 15:27:49.360109] T [socket.c:922:__socket_keepalive] 0-gfapi: Keep-alive enabled for socket: 40, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
[2018-01-15 15:27:49.360776] T [rpc-clnt.c:1496:rpc_clnt_record] 0-gfapi: Auth Info: pid: 0, uid: 0, gid: 0, owner:
[2018-01-15 15:27:49.360811] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 140, payload: 76, rpc hdr: 64
[2018-01-15 15:27:49.360856] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x2 Program: GlusterFS Handshake, ProgVers: 2, Proc: 2) to rpc-transport (gfapi)
[2018-01-15 15:27:49.360875] D [rpc-clnt-ping.c:317:rpc_clnt_start_ping] 0-gfapi: ping timeout is 0, returning
[2018-01-15 15:27:49.362364] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-gfapi: received rpc message (RPC XID: 0x2 Program: GlusterFS Handshake, ProgVers: 2, Proc: 2) from rpc-transport (gfapi)
[2018-01-15 15:27:49.362932] T [MSGID: 0] [graph.y:187:new_volume] 0-parser: New node for 'vol1-client-0'
....
....
[2018-01-15 15:27:49.386102] T [socket.c:922:__socket_keepalive] 0-vol1-client-5: Keep-alive enabled for socket: 47, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
Final graph:
+------------------------------------------------------------------------------+
  1: volume vol1-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gs-gnode1
  5:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option transport.tcp-user-timeout 0
  9:     option transport.socket.keepalive-time 20
 10:     option transport.socket.keepalive-interval 2
 11:     option transport.socket.keepalive-count 9
 12:     option send-gids true
 13: end-volume
 14: 
 15: volume vol1-client-1
 16:     type protocol/client
 17:     option ping-timeout 42
 18:     option remote-host gs-gnode2
 19:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
 20:     option transport-type socket
 21:     option transport.address-family inet
 22:     option transport.tcp-user-timeout 0
 23:     option transport.socket.keepalive-time 20
 24:     option transport.socket.keepalive-interval 2
 25:     option transport.socket.keepalive-count 9
 26:     option send-gids true
 27: end-volume
 28: 
 29: volume vol1-client-2
 30:     type protocol/client
 31:     option ping-timeout 42
 32:     option remote-host gs-gnode3
 33:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
 34:     option transport-type socket
 35:     option transport.address-family inet
 36:     option transport.tcp-user-timeout 0
 37:     option transport.socket.keepalive-time 20
 38:     option transport.socket.keepalive-interval 2
 39:     option transport.socket.keepalive-count 9
 40:     option send-gids true
 41: end-volume
 42: 
 43: volume vol1-client-3
 44:     type protocol/client
 45:     option ping-timeout 42
 46:     option remote-host gs-gnode1
 47:     option remote-subvolume /data/glusterfs/vol1/brick2/brick
 48:     option transport-type socket
 49:     option transport.address-family inet
 50:     option transport.tcp-user-timeout 0
 51:     option transport.socket.keepalive-time 20
 52:     option transport.socket.keepalive-interval 2
 53:     option transport.socket.keepalive-count 9
 54:     option send-gids true
 55: end-volume
 56: 
 57: volume vol1-client-4
 58:     type protocol/client
 59:     option ping-timeout 42
 60:     option remote-host gs-gnode2
 61:     option remote-subvolume /data/glusterfs/vol1/brick2/brick
 62:     option transport-type socket
 63:     option transport.address-family inet
 64:     option transport.tcp-user-timeout 0
 65:     option transport.socket.keepalive-time 20
 66:     option transport.socket.keepalive-interval 2
 67:     option transport.socket.keepalive-count 9
 68:     option send-gids true
 69: end-volume
 70: 
 71: volume vol1-client-5
 72:     type protocol/client
 73:     option ping-timeout 42
 74:     option remote-host gs-gnode3
 75:     option remote-subvolume /data/glusterfs/vol1/brick2/brick
 76:     option transport-type socket
 77:     option transport.address-family inet
 78:     option transport.tcp-user-timeout 0
 79:     option transport.socket.keepalive-time 20
 80:     option transport.socket.keepalive-interval 2
 81:     option transport.socket.keepalive-count 9
 82:     option send-gids true
 83: end-volume
 84: 
 85: volume vol1-disperse-0
 86:     type cluster/disperse
 87:     option redundancy 2
 88:     subvolumes vol1-client-0 vol1-client-1 vol1-client-2 vol1-client-3 vol1-client-4 vol1-client-5
 89: end-volume
 90: 
 91: volume vol1-dht
 92:     type cluster/distribute
 93:     option lock-migration off
 94:     subvolumes vol1-disperse-0
 95: end-volume
 96: 
 97: volume vol1-write-behind
 98:     type performance/write-behind
 99:     subvolumes vol1-dht
100: end-volume
101: 
102: volume vol1-read-ahead
103:     type performance/read-ahead
104:     subvolumes vol1-write-behind
105: end-volume
106: 
107: volume vol1-readdir-ahead
108:     type performance/readdir-ahead
109:     option parallel-readdir off
[2018-01-15 15:27:49.387104] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-vol1-client-0: received rpc message (RPC XID: 0x2 Program: GF-DUMP, ProgVers: 1, Proc: 1) from rpc-transport (vol1-client-0)
....
....
[2018-01-15 15:27:49.390946] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x3 Program: PORTMAP, ProgVers: 1, Proc: 1) to rpc-transport (vol1-client-1)
110:     option rda-request-size 131072
111:     option rda-cache-limit 10MB
112:     subvolumes vol1-read-ahead
113: end-volume
114: 
115: volume vol1-io-cache
116:     type performance/io-cache
117:     subvolumes vol1-readdir-ahead
118: end-volume
119: 
120: volume vol1-quick-read
121:     type performance/quick-read
122:     subvolumes vol1-io-cache
123: end-volume
124: 
125: volume vol1-open-behind
126:     type performance/open-behind
127:     subvolumes vol1-quick-read
128: end-volume
129: 
130: volume vol1-io-threads
131:     type performance/io-threads
132:     subvolumes vol1-open-behind
133: end-volume
134: 
135: volume vol1
136:     type debug/io-stats
137:     option log-level INFO
138:     option latency-measurement off
139:     option count-fop-hits off
140:     subvolumes vol1-io-threads
141: end-volume
142: 
143: volume meta-autoload
144:     type meta
145:     subvolumes vol1
146: end-volume
147: 
+------------------------------------------------------------------------------+
[2018-01-15 15:27:49.391515] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-vol1-client-2: received rpc message (RPC XID: 0x2 Program: GF-DUMP, ProgVers: 1, Proc: 1) from rpc-transport (vol1-client-2)
....
....
[2018-01-15 15:27:49.446699] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-vol1-disperse-0: Mismatching GFID's in loc
[2018-01-15 15:27:49.446719] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 0-ec: MANAGER(LOOKUP) 0x7f92bc00db48((nil)) [refs=3, winds=0, jobs=0] frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=0 state=5 {111111:000000:111111} error=22
[2018-01-15 15:27:49.446739] D [MSGID: 0] [defaults.c:1266:default_lookup_cbk] 0-stack-trace: stack-address: 0x55e1acad3c38, vol1-disperse-0 returned -1 error: Invalid argument [Invalid argument]
[2018-01-15 15:27:49.446759] D [MSGID: 0] [dht-common.c:1501:dht_revalidate_cbk] 0-vol1-dht: revalidate lookup of /shares/data returned with op_ret -1 [Invalid argument]
[2018-01-15 15:27:49.446778] I [MSGID: 109094] [dht-common.c:1514:dht_revalidate_cbk] 0-vol1-dht: Revalidate: subvolume vol1-disperse-0 for /shares/data (gfid = 2f5a8895-bf08-41df-a13f-3205c62125ab) returned -1 [Invalid argument]
[2018-01-15 15:27:49.446796] E [MSGID: 101046] [dht-common.c:1810:dht_revalidate_cbk] 0-vol1-dht: dict is null
[2018-01-15 15:27:49.446814] D [MSGID: 0] [dht-common.c:1814:dht_revalidate_cbk] 0-stack-trace: stack-address: 0x55e1acad3c38, vol1-dht returned -1 error: Invalid argument [Invalid argument]
[2018-01-15 15:27:49.446832] D [MSGID: 0] [write-behind.c:2394:wb_lookup_cbk] 0-stack-trace: stack-address: 0x55e1acad3c38, vol1-write-behind returned -1 error: Invalid argument [Invalid argument]
[2018-01-15 15:27:49.446852] D [MSGID: 0] [io-cache.c:268:ioc_lookup_cbk] 0-stack-trace: stack-address: 0x55e1acad3c38, vol1-io-cache returned -1 error: Invalid argument [Invalid argument]
[2018-01-15 15:27:49.446871] D [MSGID: 0] [quick-read.c:450:qr_lookup_cbk] 0-stack-trace: stack-address: 0x55e1acad3c38, vol1-quick-read returned -1 error: Invalid argument [Invalid argument]
[2018-01-15 15:27:49.446890] D [MSGID: 0] [defaults.c:1266:default_lookup_cbk] 0-stack-trace: stack-address: 0x55e1acad3c38, vol1-io-threads returned -1 error: Invalid argument [Invalid argument]
[2018-01-15 15:27:49.446908] D [MSGID: 0] [io-stats.c:2216:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x55e1acad3c38, vol1 returned -1 error: Invalid argument [Invalid argument]
[2018-01-15 15:27:49.446988] T [MSGID: 0] [syncop.c:1239:syncop_lookup] 0-stack-trace: stack-address: 0x55e1acad3c38, winding from gfapi to meta-autoload
[2018-01-15 15:27:49.447011] T [MSGID: 0] [defaults.c:2574:default_lookup] 0-stack-trace: stack-address: 0x55e1acad3c38, winding from meta-autoload to vol1
[2018-01-15 15:27:49.447028] T [MSGID: 0] [io-stats.c:2728:io_stats_lookup] 0-stack-trace: stack-address: 0x55e1acad3c38, winding from vol1 to vol1-io-threads
[2018-01-15 15:27:49.447047] D [MSGID: 0] [io-threads.c:358:iot_schedule] 0-vol1-io-threads: LOOKUP scheduled as fast fop
[2018-01-15 15:27:49.447113] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 0-ec: MANAGER(LOOKUP) 0x7f92bc00db48((nil)) [refs=3, winds=0, jobs=0] frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=0 state=0 {111111:000000:111111} error=22
[2018-01-15 15:27:49.447140] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 0-ec: RELEASE(LOOKUP) 0x7f92bc00db48((nil)) [refs=3, winds=0, jobs=0] frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=22 state=0 {111111:000000:111111}
[2018-01-15 15:27:49.447159] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 0-ec: RELEASE(LOOKUP) 0x7f92bc00db48((nil)) [refs=2, winds=0, jobs=0] frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=22 state=0 {111111:000000:111111}
[2018-01-15 15:27:49.447177] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 0-ec: RELEASE(LOOKUP) 0x7f92bc00db48((nil)) [refs=1, winds=0, jobs=0] frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=22 state=0 {111111:000000:111111}
...
...


* The systems log-journal lists:
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.495269,  0] ../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.489101,  0] ../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.483684,  0] ../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.478285,  0] ../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.465356,  0] ../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.459377,  0] ../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.454542,  0] ../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.449593,  0] ../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   vfs_gluster_connect: vol1: Initialized volume from servers gs-gnode1.origenis.de gs-gnode2.origenis.de gs-gnode3.origenis.de
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.420543,  0] ../source3/modules/vfs_glusterfs.c:371(vfs_gluster_connect)
Jan 15 15:27:08 samba smbd[3117]:   STATUS=daemon 'smbd' finished starting up and ready to serve connections
Jan 15 15:27:08 samba smbd[3117]: [2018/01/15 15:27:08.920529,  0] ../lib/util/become_daemon.c:124(daemon_ready)
Jan 15 15:27:08 samba systemd[1]: Started Samba SMB Daemon.
Jan 15 15:27:08 samba systemd[1]: Starting Samba SMB Daemon...

Kind regards for your help,
Henrik


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux