Hi,
Sorry for the delay, it took a long time to reproduce. But currently we have the same issue again. This was happened afrer reseting all nodes. Quorum is enabled. Logs and details below.
gluster> volume heal vm_storage_volume info split-brain
Gathering list of split brain entries on volume vm_storage_volume has been successful
Brick svm1:/srv/vol
Number of entries: 0
Brick svm2:/srv/vol
Number of entries: 0
Brick svm3:/srv/vol
Number of entries: 0
Brick svm4:/srv/vol
Number of entries: 0
Brick svm5:/srv/vol
Number of entries: 0
Brick svm6:/srv/vol
Number of entries: 2
at path on brick
-----------------------------------
2015-01-22 09:08:47 /vm_images_and_config/vm9.img
2015-01-22 09:11:52 /vm_images_and_config/vm20.img
Brick svm7:/srv/vol
Number of entries: 0
gluster> volume heal vm_storage_volume statistics
Gathering crawl statistics on volume vm_storage_volume has been successful
------------------------------------------------
Crawl statistics for brick no 0
Hostname of brick svm1
Starting time of crawl: Thu Jan 22 12:53:28 2015
Crawl is in progress
Type of crawl: INDEX
No. of entries healed: 1
No. of entries in split-brain: 0
No. of heal failed entries: 2
------------------------------------------------
Crawl statistics for brick no 1
Hostname of brick svm2
Starting time of crawl: Thu Jan 22 13:02:44 2015
Crawl is in progress
Type of crawl: INDEX
No. of entries healed: 1
No. of entries in split-brain: 0
No. of heal failed entries: 3
------------------------------------------------
Crawl statistics for brick no 2
Hostname of brick svm3
Starting time of crawl: Thu Jan 22 13:11:17 2015
Crawl is in progress
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
------------------------------------------------
Crawl statistics for brick no 3
Hostname of brick svm4
Starting time of crawl: Thu Jan 22 13:11:48 2015
Crawl is in progress
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 1
------------------------------------------------
Crawl statistics for brick no 4
Hostname of brick svm5
Starting time of crawl: Thu Jan 22 12:55:52 2015
Crawl is in progress
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 3
------------------------------------------------
Crawl statistics for brick no 5
Hostname of brick svm6
Starting time of crawl: Thu Jan 22 12:53:23 2015
Crawl is in progress
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 2
No. of heal failed entries: 2
------------------------------------------------
Crawl statistics for brick no 6
Hostname of brick svm7
Starting time of crawl: Thu Jan 22 13:24:08 2015
Crawl is in progress
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 1
[2015-01-22 09:11:51.316542] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-vm_storage_volume-client-3: changing port to 49216 (from 0)
[2015-01-22 09:11:51.317179] I [client-handshake.c:1677:select_server_supported_programs] 0-vm_storage_volume-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-01-22 09:11:51.317459] I [client-handshake.c:1462:client_setvolume_cbk] 0-vm_storage_volume-client-3: Connected to 11.2.1.204:49216, attached to remote volume '/srv/vol'.
[2015-01-22 09:11:51.317493] I [client-handshake.c:1474:client_setvolume_cbk] 0-vm_storage_volume-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2015-01-22 09:11:51.317528] I [client-handshake.c:1314:client_post_handshake] 0-vm_storage_volume-client-3: 1 fds open - Delaying child_up until they are re-opened
[2015-01-22 09:11:51.352698] I [client-handshake.c:936:client_child_up_reopen_done] 0-vm_storage_volume-client-3: last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2015-01-22 09:11:51.355238] I [client-handshake.c:450:client_set_lk_version_cbk] 0-vm_storage_volume-client-3: Server lk version = 1
[2015-01-22 09:11:51.357918] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-vm_storage_volume-replicate-0: Another crawl is in progress for vm_storage_volume-client-5
[2015-01-22 09:11:52.299413] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-vm_storage_volume-replicate-0: Unable to self-heal contents of '<gfid:f7b77d22-9606-4141-943c-b738aa2a21fc>' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2451 650 0 2 1452 5405 ] [ 12 1 64 1 3 1453 3551 ] [ 0 0 0 0 0 0 0 ] [ 0 0 0 0 0 0 0 ] [ 11 2441 650 0 0 1452 5403 ] [ 12 2442 651 1 3 1 3953 ] [ 0 0 0 0 0 0 0 ] ]
[2015-01-22 09:08:47.105262] E [client-rpc-fops.c:1533:client3_3_inodelk_cbk] 0-vm_storage_volume-client-2: remote operation failed: Transport endpoint is not connected
[2015-01-22 09:08:47.105309] E [client-rpc-fops.c:1533:client3_3_inodelk_cbk] 0-vm_storage_volume-client-3: remote operation failed: Transport endpoint is not connected
[2015-01-22 09:08:47.105654] W [client-rpc-fops.c:4243:client3_3_flush] 0-vm_storage_volume-client-2: (00000000-0000-0000-0000-000000000000) remote_fd is -1. EBADFD
[2015-01-22 09:08:47.105686] E [afr-self-heal-data.c:97:afr_sh_data_flush_cbk] 0-vm_storage_volume-replicate-0: flush failed on <gfid:84b645df-774d-49ab-b7c6-4fd44318fd34> on subvolume vm_storage_volume-client-2: File descriptor in bad state
[2015-01-22 09:08:47.105713] W [client-rpc-fops.c:4243:client3_3_flush] 0-vm_storage_volume-client-3: (00000000-0000-0000-0000-000000000000) remote_fd is -1. EBADFD
[2015-01-22 09:08:47.105727] E [afr-self-heal-data.c:97:afr_sh_data_flush_cbk] 0-vm_storage_volume-replicate-0: flush failed on <gfid:84b645df-774d-49ab-b7c6-4fd44318fd34> on subvolume vm_storage_volume-client-3: File descriptor in bad state
[2015-01-22 09:08:47.110715] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-vm_storage_volume-replicate-0: Unable to self-heal contents of '<gfid:87673d99-7651-47b1-8239-afefe8e4f320>' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2183 678 0 23 11 0 ] [ 9 0 189 0 0 0 0 ] [ 0 0 0 0 0 0 0 ] [ 0 0 0 0 0 0 0 ] [ 9 478 667 0 0 0 0 ] [ 9 478 667 0 0 0 0 ] [ 0 0 0 0 0 0 0 ] ]
[2015-01-22 09:09:00.042738] E [socket.c:2244:socket_connect_finish] 0-vm_storage_volume-client-3: connection to 172.16.0.204:24007 failed (No route to host)
gluster> volume heal vm_storage_volume info
*** glibc detected *** /usr/sbin/glfsheal: malloc(): memory corruption (fast): 0x00007f3964971f30 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x78db6)[0x7f395fa86db6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7af81)[0x7f395fa88f81]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0xd0)[0x7f395fa8ba10]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(__gf_calloc+0xbe)[0x7f396162002e]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(iobref_new+0x15)[0x7f39616226f5]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/protocol/client.so(client_submit_request+0x329)[0x7f395b712de9]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/protocol/client.so(client3_3_inodelk+0x387)[0x7f395b722297]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/protocol/client.so(client_inodelk+0x9b)[0x7f395b70e64b]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/cluster/replicate.so(+0x4967f)[0x7f395b4d167f]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/cluster/replicate.so(afr_unlock+0x81)[0x7f395b4d1831]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/cluster/replicate.so(afr_sh_data_unlock+0x6b)[0x7f395b4bb2ab]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/cluster/replicate.so(afr_sh_data_finish+0x9d)[0x7f395b4bb3fd]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/cluster/replicate.so(afr_sh_data_fxattrop_fstat_done+0x281)[0x7f395b4bcf21]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x183)[0x7f395b4bd1e3]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/protocol/client.so(client3_3_fstat_cbk+0x426)[0x7f395b72bcd6]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4)[0x7f3960b2cae4]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xcd)[0x7f3960b2ce6d]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f3960b29173]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x85a4)[0x7f395c1b75a4]
/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0xad9c)[0x7f395c1b9d9c]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x6935a)[0x7f396164935a]
/usr/lib/x86_64-linux-gnu/libgfapi.so.0(+0x75b4)[0x7f39609095b4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7db4)[0x7f3960185db4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f395fae64fd]
======= Memory map: ========
7f3950000000-7f3950021000 rw-p 00000000 00:00 0
7f3950021000-7f3954000000 ---p 00000000 00:00 0
7f3957bd4000-7f395a3ca000 rw-p 00000000 00:00 0
7f395a3ca000-7f395a3e3000 r-xp 00000000 fc:00 5395225 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/debug/io-stats.so
7f395a3e3000-7f395a5e2000 ---p 00019000 fc:00 5395225 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/debug/io-stats.so
7f395a5e2000-7f395a5e3000 r--p 00018000 fc:00 5395225 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/debug/io-stats.so
7f395a5e3000-7f395a5e5000 rw-p 00019000 fc:00 5395225 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/debug/io-stats.so
7f395a5e5000-7f395a5f2000 r-xp 00000000 fc:00 5395204 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/md-cache.so
7f395a5f2000-7f395a7f2000 ---p 0000d000 fc:00 5395204 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/md-cache.so
7f395a7f2000-7f395a7f3000 r--p 0000d000 fc:00 5395204 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/md-cache.so
7f395a7f3000-7f395a7f4000 rw-p 0000e000 fc:00 5395204 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/md-cache.so
7f395a7f4000-7f395a7fb000 r-xp 00000000 fc:00 5395206 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/open-behind.so
7f395a7fb000-7f395a9fb000 ---p 00007000 fc:00 5395206 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/open-behind.so
7f395a9fb000-7f395a9fc000 r--p 00007000 fc:00 5395206 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/open-behind.so
7f395a9fc000-7f395a9fd000 rw-p 00008000 fc:00 5395206 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/open-behind.so
7f395a9fd000-7f395aa04000 r-xp 00000000 fc:00 5395210 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/quick-read.so
7f395aa04000-7f395ac03000 ---p 00007000 fc:00 5395210 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/quick-read.so
7f395ac03000-7f395ac04000 r--p 00006000 fc:00 5395210 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/quick-read.so
7f395ac04000-7f395ac05000 rw-p 00007000 fc:00 5395210 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/quick-read.so
7f395ac05000-7f395ac15000 r-xp 00000000 fc:00 5395213 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/io-cache.so
7f395ac15000-7f395ae15000 ---p 00010000 fc:00 5395213 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/io-cache.so
7f395ae15000-7f395ae16000 r--p 00010000 fc:00 5395213 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/io-cache.so
7f395ae16000-7f395ae18000 rw-p 00011000 fc:00 5395213 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/io-cache.so
7f395ae18000-7f395ae23000 r-xp 00000000 fc:00 5395209 /usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/xlator/performance/read-ahead.soBrick svm1:/srv/vol/
/vm_images_and_config/vm17.img - Possibly undergoing heal
/vm_images_and_config/vm12.img - Possibly undergoing heal
/vm_images_and_config/vm1.img - Possibly undergoing heal
/vm_images_and_config/vm9.img - Possibly undergoing heal
/vm_images_and_config/vm20.img - Possibly undergoing heal
/vm_images_and_config/vm15.img - Possibly undergoing heal
/vm_images_and_config/vm5.img - Possibly undergoing heal
/vm_images_and_config/vm7.img - Possibly undergoing heal
/vm_images_and_config/vm13.img - Possibly undergoing heal
/users/admvs/.mozilla/firefox/admvs.default
/vm_images_and_config/vm2.img - Possibly undergoing heal
/vm_images_and_config/vm19.img - Possibly undergoing heal
/vm_images_and_config/vm-clone.img - Possibly undergoing heal
/vm_images_and_config/vm8.img - Possibly undergoing heal
/vm_images_and_config/vm18.img - Possibly undergoing heal
/vm_images_and_config/vm16.img - Possibly undergoing heal
/vm_images_and_config/vm6.img - Possibly undergoing heal
Best regards,
Alexey
2014-12-28 5:51 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
Could you please describe what is the kind of split-brain that happened?
On 12/25/2014 08:05 PM, Alexey wrote:
Hi all,
We are using glusterfs setup with a quorum turned on and the configuration as the follows:
Nodes: 3Type: ReplicateNumber of Bricks: 1 x 3 = 3cluster.quorum-type: fixedcluster.quorum-count: 2cluster.data-self-heal-algorithm: diffcluster.server-quorum-ratio: 51%glusterfs version: 3.5.3
Despite on the quorum is turned on sometimes we are still encounter a split-brain occurrence after shutting down one node or all nodes together/
Is this is a normal behavior? What conditions could lead to this and how to prevent split-brain occurence?
Pranith
BR,Alexey
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users