Hi all
We have currently a production cluster with 2 nodes in a distributed replicated setup with glusterfs version 3.6.4 which was updated from gluster version 3.5.x. I just expanded the cluster with 2 extra nodes with glusterfs version 3.6.4 installed
but when running the rebalancing command some errors show up:
gluster volume rebalance uploadtmp start
volume rebalance: uploadtmp: success: Initiated rebalance on volume uploadtmp.
Execute "gluster volume rebalance <volume-name> status" to check status.
ID: 49264fc3-6cb2-424b-8522-425989b125d9
gluster volume rebalance uploadtmp status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 0 0Bytes 5 2 0 completed 2.00
gfs01b-dcg.intnet.be 0 0Bytes 5 2 0 completed 1.00
gfs02a-dcg.intnet.be 0 0Bytes 5 2 0 completed 2.00
gfs02b-dcg.intnet.be 0 0Bytes 5 2 0 completed 2.00
volume rebalance: uploadtmp: success:
tail -200 uploadtmp-rebalance.log
[2015-09-14 11:41:24.881814] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/uploadtmp --xlator-option *dht.use-readdirp=yes
--xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off
--xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=052e85f3-2693-43d1-a586-3ebe11e86055 --socket-file /var/run/gluster/gluster-rebalance-69232195-1b1c-47c4-8ef9-a1145cb7fc7a.sock --pid-file /var/lib/glusterd/vols/uploadtmp/rebalance/052e85f3-2693-43d1-a586-3ebe11e86055.pid
-l /var/log/glusterfs/uploadtmp-rebalance.log)
[2015-09-14 11:41:29.901106] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-dht: adding option 'node-uuid' for volume 'uploadtmp-dht' with value '052e85f3-2693-43d1-a586-3ebe11e86055'
[2015-09-14 11:41:29.901142] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-dht: adding option 'rebalance-cmd' for volume 'uploadtmp-dht' with value '1'
[2015-09-14 11:41:29.901152] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-dht: adding option 'readdir-optimize' for volume 'uploadtmp-dht' with value 'on'
[2015-09-14 11:41:29.901159] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-dht: adding option 'assert-no-child-down' for volume 'uploadtmp-dht' with value 'yes'
[2015-09-14 11:41:29.901166] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-dht: adding option 'lookup-unhashed' for volume 'uploadtmp-dht' with value 'yes'
[2015-09-14 11:41:29.901173] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-dht: adding option 'use-readdirp' for volume 'uploadtmp-dht' with value 'yes'
[2015-09-14 11:41:29.901180] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-replicate-1: adding option 'readdir-failover' for volume 'uploadtmp-replicate-1' with value 'off'
[2015-09-14 11:41:29.901188] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-replicate-1: adding option 'entry-self-heal' for volume 'uploadtmp-replicate-1' with value 'off'
[2015-09-14 11:41:29.901195] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-replicate-1: adding option 'metadata-self-heal' for volume 'uploadtmp-replicate-1' with value 'off'
[2015-09-14 11:41:29.901208] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-replicate-1: adding option 'data-self-heal' for volume 'uploadtmp-replicate-1' with value 'off'
[2015-09-14 11:41:29.901215] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-replicate-0: adding option 'readdir-failover' for volume 'uploadtmp-replicate-0' with value 'off'
[2015-09-14 11:41:29.901222] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-replicate-0: adding option 'entry-self-heal' for volume 'uploadtmp-replicate-0' with value 'off'
[2015-09-14 11:41:29.901228] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-replicate-0: adding option 'metadata-self-heal' for volume 'uploadtmp-replicate-0' with value 'off'
[2015-09-14 11:41:29.901234] I [graph.c:269:gf_add_cmdline_options] 0-uploadtmp-replicate-0: adding option 'data-self-heal' for volume 'uploadtmp-replicate-0' with value 'off'
[2015-09-14 11:41:29.901726] I [dht-shared.c:337:dht_init_regex] 0-uploadtmp-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-09-14 11:41:29.909507] W [graph.c:344:_log_if_unknown_option] 0-uploadtmp-replicate-1: option 'readdir-failover' is not recognized
[2015-09-14 11:41:29.909611] W [graph.c:344:_log_if_unknown_option] 0-uploadtmp-replicate-0: option 'readdir-failover' is not recognized
[2015-09-14 11:41:29.909644] I [client.c:2280:notify] 0-uploadtmp-client-0: parent translators are ready, attempting connect on transport
[2015-09-14 11:41:29.910192] I [client.c:2280:notify] 0-uploadtmp-client-1: parent translators are ready, attempting connect on transport
[2015-09-14 11:41:29.911618] I [client.c:2280:notify] 0-uploadtmp-client-2: parent translators are ready, attempting connect on transport
[2015-09-14 11:41:29.912884] I [client.c:2280:notify] 0-uploadtmp-client-3: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
1: volume uploadtmp-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host
gfs01a-dcg.intnet.be
5: option remote-subvolume /mnt/uploadtmp/brick1
6: option transport-type socket
7: option send-gids true
8: end-volume
9:
10: volume uploadtmp-client-1
11: type protocol/client
12: option ping-timeout 42
13: option remote-host
gfs01b-dcg.intnet.be
14: option remote-subvolume /mnt/uploadtmp/brick1
15: option transport-type socket
16: option send-gids true
17: end-volume
18:
19: volume uploadtmp-replicate-0
20: type cluster/replicate
21: option data-self-heal off
22: option metadata-self-heal off
23: option entry-self-heal off
24: option readdir-failover off
25: subvolumes uploadtmp-client-0 uploadtmp-client-1
26: end-volume
27:
28: volume uploadtmp-client-2
29: type protocol/client
30: option ping-timeout 42
31: option remote-host
gfs02a-dcg.intnet.be
32: option remote-subvolume /mnt/uploadtmp/brick1
33: option transport-type socket
34: option send-gids true
35: end-volume
36:
37: volume uploadtmp-client-3
38: type protocol/client
39: option ping-timeout 42
40: option remote-host
gfs02b-dcg.intnet.be
41: option remote-subvolume /mnt/uploadtmp/brick1
42: option transport-type socket
43: option send-gids true
44: end-volume
45:
46: volume uploadtmp-replicate-1
47: type cluster/replicate
48: option data-self-heal off
49: option metadata-self-heal off
50: option entry-self-heal off
51: option readdir-failover off
52: subvolumes uploadtmp-client-2 uploadtmp-client-3
53: end-volume
54:
55: volume uploadtmp-dht
56: type cluster/distribute
57: option use-readdirp yes
58: option lookup-unhashed yes
59: option assert-no-child-down yes
60: option readdir-optimize on
61: option rebalance-cmd 1
62: option node-uuid 052e85f3-2693-43d1-a586-3ebe11e86055
63: subvolumes uploadtmp-replicate-0 uploadtmp-replicate-1
64: end-volume
65:
66: volume uploadtmp-write-behind
67: type performance/write-behind
68: subvolumes uploadtmp-dht
69: end-volume
70:
71: volume uploadtmp-read-ahead
72: type performance/read-ahead
73: subvolumes uploadtmp-write-behind
74: end-volume
75:
76: volume uploadtmp-io-cache
77: type performance/io-cache
78: subvolumes uploadtmp-read-ahead
79: end-volume
80:
81: volume uploadtmp-quick-read
82: type performance/quick-read
83: subvolumes uploadtmp-io-cache
84: end-volume
85:
86: volume uploadtmp-open-behind
87: type performance/open-behind
88: option read-after-open yes
89: subvolumes uploadtmp-quick-read
90: end-volume
91:
92: volume uploadtmp-md-cache
93: type performance/md-cache
94: subvolumes uploadtmp-open-behind
95: end-volume
96:
97: volume uploadtmp
98: type debug/io-stats
99: option latency-measurement off
100: option count-fop-hits off
101: subvolumes uploadtmp-md-cache
102: end-volume
103:
+------------------------------------------------------------------------------+
[2015-09-14 11:41:29.915765] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-uploadtmp-client-0: changing port to 49152 (from 0)
[2015-09-14 11:41:29.916607] I [client-handshake.c:1413:select_server_supported_programs] 0-uploadtmp-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-14 11:41:29.916922] I [client-handshake.c:1200:client_setvolume_cbk] 0-uploadtmp-client-0: Connected to uploadtmp-client-0, attached to remote volume '/mnt/uploadtmp/brick1'.
[2015-09-14 11:41:29.916937] I [client-handshake.c:1210:client_setvolume_cbk] 0-uploadtmp-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2015-09-14 11:41:29.916994] I [MSGID: 108005] [afr-common.c:3672:afr_notify] 0-uploadtmp-replicate-0: Subvolume 'uploadtmp-client-0' came back up; going online.
[2015-09-14 11:41:29.917033] I [client-handshake.c:188:client_set_lk_version_cbk] 0-uploadtmp-client-0: Server lk version = 1
[2015-09-14 11:41:34.884359] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-uploadtmp-client-1: changing port to 49152 (from 0)
[2015-09-14 11:41:34.892294] I [client-handshake.c:1413:select_server_supported_programs] 0-uploadtmp-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-14 11:41:34.893999] I [client-handshake.c:1200:client_setvolume_cbk] 0-uploadtmp-client-1: Connected to uploadtmp-client-1, attached to remote volume '/mnt/uploadtmp/brick1'.
[2015-09-14 11:41:34.894020] I [client-handshake.c:1210:client_setvolume_cbk] 0-uploadtmp-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2015-09-14 11:41:34.894446] I [client-handshake.c:188:client_set_lk_version_cbk] 0-uploadtmp-client-1: Server lk version = 1
[2015-09-14 11:41:34.894864] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-uploadtmp-client-3: changing port to 49153 (from 0)
[2015-09-14 11:41:34.895947] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-uploadtmp-client-2: changing port to 49153 (from 0)
[2015-09-14 11:41:34.898011] I [client-handshake.c:1413:select_server_supported_programs] 0-uploadtmp-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-14 11:41:34.898861] I [client-handshake.c:1413:select_server_supported_programs] 0-uploadtmp-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-14 11:41:34.899259] I [client-handshake.c:1200:client_setvolume_cbk] 0-uploadtmp-client-3: Connected to uploadtmp-client-3, attached to remote volume '/mnt/uploadtmp/brick1'.
[2015-09-14 11:41:34.899279] I [client-handshake.c:1210:client_setvolume_cbk] 0-uploadtmp-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2015-09-14 11:41:34.899325] I [MSGID: 108005] [afr-common.c:3672:afr_notify] 0-uploadtmp-replicate-1: Subvolume 'uploadtmp-client-3' came back up; going online.
[2015-09-14 11:41:34.899923] I [client-handshake.c:1200:client_setvolume_cbk] 0-uploadtmp-client-2: Connected to uploadtmp-client-2, attached to remote volume '/mnt/uploadtmp/brick1'.
[2015-09-14 11:41:34.899951] I [client-handshake.c:1210:client_setvolume_cbk] 0-uploadtmp-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2015-09-14 11:41:34.900191] I [client-handshake.c:188:client_set_lk_version_cbk] 0-uploadtmp-client-3: Server lk version = 1
[2015-09-14 11:41:34.902578] I [client-handshake.c:188:client_set_lk_version_cbk] 0-uploadtmp-client-2: Server lk version = 1
[2015-09-14 11:41:34.910808] I [afr-common.c:1477:afr_local_discovery_cbk] 0-uploadtmp-replicate-0: selecting local read_child uploadtmp-client-0
[2015-09-14 11:41:34.917249] I [dht-common.c:3309:dht_setxattr] 0-uploadtmp-dht: fixing the layout of /
[2015-09-14 11:41:34.917299] I [dht-selfheal.c:960:dht_fix_layout_of_directory] 0-uploadtmp-dht: subvolume 0 (uploadtmp-replicate-0): 251980 chunks
[2015-09-14 11:41:34.917314] I [dht-selfheal.c:960:dht_fix_layout_of_directory] 0-uploadtmp-dht: subvolume 1 (uploadtmp-replicate-1): 251980 chunks
[2015-09-14 11:41:34.917327] I [dht-selfheal.c:1065:dht_selfheal_layout_new_directory] 0-uploadtmp-dht: chunk size = 0xffffffff / 503960 = 0x214a
[2015-09-14 11:41:34.917361] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-uploadtmp-dht: assigning range size 0x7ffe51f8 to uploadtmp-replicate-0
[2015-09-14 11:41:34.917376] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-uploadtmp-dht: assigning range size 0x7ffe51f8 to uploadtmp-replicate-1
[2015-09-14 11:41:34.917410] I [MSGID: 109036] [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 0-uploadtmp-dht: Setting layout of / with [Subvol_name: uploadtmp-replicate-0, Err: -1 , Start: 2147373560 , Stop: 4294967295 ], [Subvol_name:
uploadtmp-replicate-1, Err: -1 , Start: 0 , Stop: 2147373559 ],
[2015-09-14 11:41:34.922000] W [client-rpc-fops.c:1023:client3_3_setxattr_cbk] 0-uploadtmp-client-3: remote operation failed: Permission denied
[2015-09-14 11:41:34.922071] W [client-rpc-fops.c:1023:client3_3_setxattr_cbk] 0-uploadtmp-client-2: remote operation failed: Permission denied
[2015-09-14 11:41:34.926973] I [dht-rebalance.c:1405:gf_defrag_migrate_data] 0-uploadtmp-dht: migrate data called on /
[2015-09-14 11:41:34.929176] W [client-rpc-fops.c:2145:client3_3_setattr_cbk] 0-uploadtmp-client-3: remote operation failed: Operation not permitted
[2015-09-14 11:41:34.929231] W [client-rpc-fops.c:2145:client3_3_setattr_cbk] 0-uploadtmp-client-2: remote operation failed: Operation not permitted
[2015-09-14 11:41:34.929660] E [MSGID: 109004] [dht-selfheal.c:1382:dht_dir_attr_heal] 0-dht: Directory attr heal failed. Failed to set uid/gid on path / on subvol uploadtmp-replicate-1, gfid = 00000000-0000-0000-0000-000000000001
[Operation not permitted]
[2015-09-14 11:41:35.453870] I [dht-rebalance.c:1649:gf_defrag_migrate_data] 0-uploadtmp-dht: Migration operation on dir / took 0.53 secs
[2015-09-14 11:41:35.794673] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-uploadtmp-client-2: remote operation failed: Permission denied. Path: /atheneumwillebroek-rvl
[2015-09-14 11:41:35.794874] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-uploadtmp-client-3: remote operation failed: Permission denied. Path: /atheneumwillebroek-rvl
[2015-09-14 11:41:35.795868] W [MSGID: 109005] [dht-selfheal.c:582:dht_selfheal_dir_mkdir_cbk] 0-uploadtmp-dht: Directory selfheal failed: path = /atheneumwillebroek-rvl, gfid = 120476ff-55bb-43ab-9155-740c7db69fef [Permission denied]
[2015-09-14 11:41:35.795953] I [MSGID: 109036] [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 0-uploadtmp-dht: Setting layout of /atheneumwillebroek-rvl with [Subvol_name: uploadtmp-replicate-0, Err: 0 , Start: 0 , Stop: 4294967295 ],
[Subvol_name: uploadtmp-replicate-1, Err: 2 , Start: 0 , Stop: 0 ],
[2015-09-14 11:41:35.798301] I [afr-lk-common.c:1078:afr_lock_blocking] 0-uploadtmp-replicate-1: unable to lock on even one child
[2015-09-14 11:41:35.798669] I [afr-transaction.c:1096:afr_post_blocking_inodelk_cbk] 0-uploadtmp-replicate-1: Blocking inodelks failed.
[2015-09-14 11:41:35.798829] I [dht-common.c:3309:dht_setxattr] 0-uploadtmp-dht: fixing the layout of /atheneumwillebroek-rvl
[2015-09-14 11:41:35.798853] I [dht-selfheal.c:960:dht_fix_layout_of_directory] 0-uploadtmp-dht: subvolume 0 (uploadtmp-replicate-0): 251980 chunks
[2015-09-14 11:41:35.798863] I [dht-selfheal.c:960:dht_fix_layout_of_directory] 0-uploadtmp-dht: subvolume 1 (uploadtmp-replicate-1): 251980 chunks
[2015-09-14 11:41:35.798873] I [dht-selfheal.c:1065:dht_selfheal_layout_new_directory] 0-uploadtmp-dht: chunk size = 0xffffffff / 251980 = 0x4294
[2015-09-14 11:41:35.798936] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-uploadtmp-dht: assigning range size 0xfffca3f0 to uploadtmp-replicate-0
[2015-09-14 11:41:35.798964] I [MSGID: 109036] [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 0-uploadtmp-dht: Setting layout of /atheneumwillebroek-rvl with [Subvol_name: uploadtmp-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],
[Subvol_name: uploadtmp-replicate-1, Err: 116 , Start: 0 , Stop: 0 ],
[2015-09-14 11:41:35.801072] I [afr-lk-common.c:1078:afr_lock_blocking] 0-uploadtmp-replicate-1: unable to lock on even one child
[2015-09-14 11:41:35.801148] I [afr-transaction.c:1096:afr_post_blocking_inodelk_cbk] 0-uploadtmp-replicate-1: Blocking inodelks failed.
[2015-09-14 11:41:35.999884] I [dht-rebalance.c:1405:gf_defrag_migrate_data] 0-uploadtmp-dht: migrate data called on /atheneumwillebroek-rvl
[2015-09-14 11:41:36.541735] I [dht-rebalance.c:1649:gf_defrag_migrate_data] 0-uploadtmp-dht: Migration operation on dir /atheneumwillebroek-rvl took 0.54 secs
[2015-09-14 11:41:36.557300] I [dht-common.c:3309:dht_setxattr] 0-uploadtmp-dht: fixing the layout of /atheneumwillebroek-rvl/G387KnvoHHBMrKjrPHg6NFHhk144221575722193683
[2015-09-14 11:41:36.557347] I [dht-selfheal.c:960:dht_fix_layout_of_directory] 0-uploadtmp-dht: subvolume 0 (uploadtmp-replicate-0): 251980 chunks
[2015-09-14 11:41:36.557361] I [dht-selfheal.c:960:dht_fix_layout_of_directory] 0-uploadtmp-dht: subvolume 1 (uploadtmp-replicate-1): 251980 chunks
[2015-09-14 11:41:36.557370] I [dht-selfheal.c:1065:dht_selfheal_layout_new_directory] 0-uploadtmp-dht: chunk size = 0xffffffff / 251980 = 0x4294
[2015-09-14 11:41:36.557389] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-uploadtmp-dht: assigning range size 0xfffca3f0 to uploadtmp-replicate-0
[2015-09-14 11:41:36.557407] I [MSGID: 109036] [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 0-uploadtmp-dht: Setting layout of /atheneumwillebroek-rvl/G387KnvoHHBMrKjrPHg6NFHhk144221575722193683 with [Subvol_name: uploadtmp-replicate-0,
Err: -1 , Start: 0 , Stop: 4294967295 ], [Subvol_name: uploadtmp-replicate-1, Err: 116 , Start: 0 , Stop: 0 ],
[2015-09-14 11:41:36.559315] I [afr-lk-common.c:1078:afr_lock_blocking] 0-uploadtmp-replicate-1: unable to lock on even one child
[2015-09-14 11:41:36.559339] I [afr-transaction.c:1096:afr_post_blocking_inodelk_cbk] 0-uploadtmp-replicate-1: Blocking inodelks failed.
[2015-09-14 11:41:36.562725] E [dht-rebalance.c:1685:gf_defrag_fix_layout] 0-uploadtmp-dht: Lookup failed on /atheneumwillebroek-rvl/G387KnvoHHBMrKjrPHg6NFHhk144221575722193683
[2015-09-14 11:41:36.562744] E [MSGID: 109016] [dht-rebalance.c:1815:gf_defrag_fix_layout] 0-uploadtmp-dht: Fix layout failed for /atheneumwillebroek-rvl/G387KnvoHHBMrKjrPHg6NFHhk144221575722193683
[2015-09-14 11:41:36.562826] E [MSGID: 109016] [dht-rebalance.c:1815:gf_defrag_fix_layout] 0-uploadtmp-dht: Fix layout failed for /atheneumwillebroek-rvl
[2015-09-14 11:41:36.563249] I [MSGID: 109028] [dht-rebalance.c:2111:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 2.00 secs
[2015-09-14 11:41:36.563273] I [MSGID: 109028] [dht-rebalance.c:2115:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 5, failures: 2, skipped: 0
[2015-09-14 11:41:36.563589] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (15), shutting down
As far as I can see all gluster processes run under the user root. Anybody any ideas on how to fix this?
Thanks in advance.
Kind regards.
Davy
|
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users