Hi, I ran 3.2.3 under Ubuntu 10.04 LTS with some pretty serious IO tests. My install was rock solid. Doesn't help much, but may indicate to look outside of gluster. Gerald ----- Original Message ----- From: "anthony garnier" <sokar6012 at hotmail.com> To: gluster-users at gluster.org Sent: Wednesday, November 30, 2011 9:42:38 AM Subject: NFS server crash under heavy load Hi, I've got some issues with gluster 3.2.3. Servers are on SLES 11 Client is on Solaris On my client when I try to do rm -rf on a folder with big files inside, the NFS server crash . Here is my volume configuration Volume Name: poolsave Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: ylal3550:/users3/poolsave Brick2: ylal3570:/users3/poolsave Brick3: ylal3560:/users3/poolsave Brick4: ylal3580:/users3/poolsave Options Reconfigured: performance.io-thread-count: 64 nfs.port: 2049 performance.cache-refresh-timeout: 2 performance.cache-max-file-size: 4GB performance.cache-min-file-size: 1KB network.ping-timeout: 10 performance.cache-size: 6GB nfs.log : [2011-11-30 16:14:19.3887] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 644) [2011-11-30 16:14:19.3947] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 646) [2011-11-30 16:14:19.3967] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 647) [2011-11-30 16:14:19.4008] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 648) [2011-11-30 16:14:19.4109] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 652) [2011-11-30 16:14:19.4134] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 653) [2011-11-30 16:14:19.4162] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 654) [2011-11-30 16:14:19.4181] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 655) [2011-11-30 16:14:19.4201] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 656) [2011-11-30 16:14:19.4243] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 658) [2011-11-30 16:14:19.4341] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 659) [2011-11-30 16:14:19.4386] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 660) [2011-11-30 16:14:19.4435] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 661) [2011-11-30 16:14:19.4493] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 662) [2011-11-30 16:14:19.4581] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 664) [2011-11-30 16:14:19.4618] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 667) [2011-11-30 16:14:19.4657] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 669) [2011-11-30 16:14:19.4702] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 670) [2011-11-30 16:14:19.4727] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 672) [2011-11-30 16:14:19.4751] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 674) [2011-11-30 16:14:19.4878] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 676) [2011-11-30 16:14:19.5018] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 680) [2011-11-30 16:14:19.5050] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 681) [2011-11-30 16:14:19.5088] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 685) [2011-11-30 16:14:19.5128] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 689) [2011-11-30 16:14:19.5154] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 690) [2011-11-30 16:14:19.5357] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 695) [2011-11-30 16:14:19.5431] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 698) [2011-11-30 16:14:19.5470] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 699) [2011-11-30 16:14:19.5556] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 701) [2011-11-30 16:14:19.5636] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 702) [2011-11-30 16:14:19.5829] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 705) [2011-11-30 16:14:19.5946] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 706) [2011-11-30 16:14:19.6034] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 707) [2011-11-30 16:14:19.6135] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 710) [2011-11-30 16:14:19.6187] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 712) [2011-11-30 16:14:19.6208] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 713) [2011-11-30 16:14:19.6241] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 715) [2011-11-30 16:14:19.6283] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 717) [2011-11-30 16:14:19.6357] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 718) [2011-11-30 16:14:19.6453] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 721) [2011-11-30 16:14:19.6486] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 723) [2011-11-30 16:14:19.6584] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 725) [2011-11-30 16:14:19.6685] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 727) [2011-11-30 16:14:19.6726] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 729) [2011-11-30 16:14:19.6780] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 730) [2011-11-30 16:14:19.6800] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 731) [2011-11-30 16:14:19.6859] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 732) [2011-11-30 16:14:19.6951] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 733) [2011-11-30 16:14:19.7053] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 734) [2011-11-30 16:14:19.7102] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 736) [2011-11-30 16:14:19.7132] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 737) [2011-11-30 16:14:19.7204] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 738) [2011-11-30 16:14:19.7271] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 739) [2011-11-30 16:14:19.7365] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 740) [2011-11-30 16:14:19.7410] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 741) [2011-11-30 16:14:19.7434] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 742) [2011-11-30 16:14:19.7482] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 744) [2011-11-30 16:14:19.7624] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 747) [2011-11-30 16:14:19.7684] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 750) [2011-11-30 16:14:19.7712] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 752) [2011-11-30 16:14:19.7734] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 753) [2011-11-30 16:14:19.7760] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 754) [2011-11-30 16:14:19.7849] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 757) [2011-11-30 16:14:19.7941] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 759) [2011-11-30 16:14:19.8030] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 761) [2011-11-30 16:14:19.8134] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 763) [2011-11-30 16:14:19.8165] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 765) [2011-11-30 16:14:19.8270] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 768) [2011-11-30 16:14:19.8336] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 769) [2011-11-30 16:14:19.8507] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 773) [2011-11-30 16:14:19.8559] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 775) [2011-11-30 16:14:19.8769] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 780) [2011-11-30 16:14:19.8919] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 785) [2011-11-30 16:14:19.8944] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 786) [2011-11-30 16:14:19.9007] I [client-handshake.c:536:client3_1_reopendir_cbk] 0-poolsave-client-2: reopendir on / succeeded (fd = 788) [2011-11-30 16:14:19.9101] I [client-lk.c:617:decrement_reopen_fd_count] 0-poolsave-client-2: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2011-11-30 16:14:19.9396] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.9704] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.10052] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.10545] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:19.11189] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:19.11755] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.12171] W [dict.c:418:dict_unref] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x7f2247375672] (-->/usr/local/lib//glusterfs/3.2.3/xlator/protocol/client.so(client3_1_fstat_cbk+0x2c9) [0x7f2245424189] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d) [0x7f22452cc6ad]))) 0-dict: dict is NULL [2011-11-30 16:14:19.12641] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.12933] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.13202] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.17414] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.21832] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:19.24762] W [afr-open.c:624:afr_openfd_flush] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:26.374702] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-poolsave-replicate-1: diff self-heal on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065: completed. (669 blocks of 29162 were different (2.29%)) [2011-11-30 16:14:26.375814] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:26.375870] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:26.375886] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:26.376152] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:26.376757] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:26.378231] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:26.378274] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:26.378289] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:26.378532] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:26.379196] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:26.380324] W [dict.c:418:dict_unref] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x7f2247375672] (-->/usr/local/lib//glusterfs/3.2.3/xlator/protocol/client.so(client3_1_fstat_cbk+0x2c9) [0x7f2245424189] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d) [0x7f22452cc6ad]))) 0-dict: dict is NULL [2011-11-30 16:14:33.110476] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-poolsave-replicate-1: diff self-heal on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065: completed. (0 blocks of 29162 were different (0.00%)) [2011-11-30 16:14:33.111841] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:33.111956] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:33.111990] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:33.112295] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:33.113059] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:33.114314] W [dict.c:418:dict_unref] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x7f2247375672] (-->/usr/local/lib//glusterfs/3.2.3/xlator/protocol/client.so(client3_1_fstat_cbk+0x2c9) [0x7f2245424189] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d) [0x7f22452cc6ad]))) 0-dict: dict is NULL [2011-11-30 16:14:39.819854] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-poolsave-replicate-1: diff self-heal on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065: completed. (0 blocks of 29163 were different (0.00%)) [2011-11-30 16:14:39.821191] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:39.821251] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:39.821277] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:39.821565] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:39.822291] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:39.823922] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:39.823979] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:39.824006] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:39.824434] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:39.825269] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:39.826867] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:39.826925] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:39.826960] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:39.827437] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:39.828080] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:39.829501] W [dict.c:418:dict_unref] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x7f2247375672] (-->/usr/local/lib//glusterfs/3.2.3/xlator/protocol/client.so(client3_1_fstat_cbk+0x2c9) [0x7f2245424189] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d) [0x7f22452cc6ad]))) 0-dict: dict is NULL [2011-11-30 16:14:46.521672] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-poolsave-replicate-1: diff self-heal on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065: completed. (0 blocks of 29163 were different (0.00%)) [2011-11-30 16:14:46.523091] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:46.523134] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:46.523173] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:46.523475] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:46.524282] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:46.525721] W [dict.c:418:dict_unref] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x7f2247375672] (-->/usr/local/lib//glusterfs/3.2.3/xlator/protocol/client.so(client3_1_fstat_cbk+0x2c9) [0x7f2245424189] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d) [0x7f22452cc6ad]))) 0-dict: dict is NULL [2011-11-30 16:14:53.214149] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-poolsave-replicate-1: diff self-heal on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065: completed. (0 blocks of 29164 were different (0.00%)) [2011-11-30 16:14:53.215561] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:53.215607] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:53.215648] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:53.215951] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:53.216646] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:53.218239] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:53.218292] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:53.218320] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) [2011-11-30 16:14:53.218630] I [afr-open.c:435:afr_openfd_sh] 0-poolsave-replicate-1: data self-heal triggered. path: /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065, reason: Replicate up down flush, data lock is held [2011-11-30 16:14:53.219392] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-poolsave-replicate-1: no missing files - /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065. proceeding to metadata check [2011-11-30 16:14:53.221056] W [afr-common.c:122:afr_set_split_brain] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x72) [0x7f22452cc8e2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_data_done+0x42) [0x7f22452cacf2] (-->/usr/local/lib//glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x21b) [0x7f22452d0ccb]))) 0-poolsave-replicate-1: invalid argument: inode [2011-11-30 16:14:53.221102] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-poolsave-replicate-1: background data data self-heal completed on /yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065 [2011-11-30 16:14:53.221215] W [afr-open.c:326:afr_openfd_sh_unwind] 0-poolsave-replicate-1: fd not open on any subvolume 0x7f2241c8f948 (/yvask300/des01/save/r/p/des01/11-11-22/10h03m52s/inc0+arc/data_channel-1/134_1_1_767873065) pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2011-11-30 16:21:05 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.2.3 /lib64/libc.so.6(+0x329e0)[0x7f2246b069e0] /usr/local/lib//glusterfs/3.2.3/xlator/nfs/server.so(nfs_fop_lookup_cbk+0x60)[0x7f2244adc1c0] /usr/local/lib//glusterfs/3.2.3/xlator/debug/io-stats.so(io_stats_lookup_cbk+0xe4)[0x7f2244c281c4] /usr/local/lib//glusterfs/3.2.3/xlator/performance/quick-read.so(qr_lookup_cbk+0x1cd)[0x7f2244d3e39d] /usr/local/lib//glusterfs/3.2.3/xlator/performance/io-cache.so(ioc_lookup_cbk+0x32e)[0x7f2244e50bde] /usr/local/lib/libglusterfs.so.0(default_lookup_cbk+0xaa)[0x7f22475616aa] --------- _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users