Vijay Bellur <vbellur@xxxxxxxxxx> wrote: > I have not been able to re-create the problem in my setup. I think it > would be a good idea to track this bug and address it. For now, can we > not use the volume set mechanism to disable eager-locking? Our exchanges have gone off list after this message. I repost here the 100k last lines of log with debug mode: http://ftp.espci.fr/shadow/manu/log relevant part: [2013-07-22 15:36:22.923866] D [afr-lk-common.c:447:transaction_lk_op] 0-gfs34-replicate-0: lk op is for a transaction [2013-07-22 15:36:22.924484] D [client-rpc-fops.c:2789:client_fdctx_destroy] 0-gfs34-client-0: sending release on fd [2013-07-22 15:36:22.924560] D [client-rpc-fops.c:2789:client_fdctx_destroy] 0-gfs34-client-1: sending release on fd [2013-07-22 15:36:22.943156] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ] [2013-07-22 15:36:22.943202] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ] [2013-07-22 15:36:22.943236] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gfs34-replicate-1: Number of sources: -1 [2013-07-22 15:36:22.943271] D [afr-self-heal-data.c:794:afr_lookup_select_read_child_by_txn_type] 0-gfs34-replicate-1: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po: Possible split-brain [2013-07-22 15:36:22.943305] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-gfs34-replicate-1: returning read_child: 1 [2013-07-22 15:36:22.943336] D [afr-common.c:1380:afr_lookup_select_read_child] 0-gfs34-replicate-1: Source selected as 1 for /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po [2013-07-22 15:36:22.943374] D [afr-common.c:1117:afr_lookup_build_response_params] 0-gfs34-replicate-1: Building lookup response from 1 [2013-07-22 15:36:22.943409] D [afr-common.c:1265:afr_detect_self_heal_by_iatt] 0-gfs34-replicate-1: size differs for /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po [2013-07-22 15:36:22.943444] D [afr-common.c:1291:afr_detect_self_heal_by_split_brain_status] 0-gfs34-replicate-1: split brain detected during lookup of /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po. [2013-07-22 15:36:22.943478] D [afr-common.c:1426:afr_launch_self_heal] 0-gfs34-replicate-1: background data self-heal triggered. path: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po, reason: lookup detected pending operations [2013-07-22 15:36:23.272807] D [afr-self-heal-metadata.c:486:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-gfs34-replicate-1: Non Blocking metadata inodelks done for /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po. Proceeding to FOP [2013-07-22 15:36:23.272868] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool is full. Callocing mem [2013-07-22 15:36:23.272900] D [afr-self-heal-common.c:1930:afr_sh_common_lookup] 0-gfs34-replicate-1: looking up /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po on subvolume gfs34-client-2 [2013-07-22 15:36:23.272986] D [afr-self-heal-common.c:1930:afr_sh_common_lookup] 0-gfs34-replicate-1: looking up /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po on subvolume gfs34-client-3 [2013-07-22 15:36:23.273596] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool is full. Callocing mem [2013-07-22 15:36:23.273752] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool is full. Callocing mem [2013-07-22 15:36:23.273792] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ] [2013-07-22 15:36:23.273829] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ] [2013-07-22 15:36:23.273862] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gfs34-replicate-1: Number of sources: 2 [2013-07-22 15:36:23.273895] D [afr-lk-common.c:452:transaction_lk_op] 0-gfs34-replicate-1: lk op is for a self heal [2013-07-22 15:36:23.276705] D [afr-self-heal-metadata.c:61:afr_sh_metadata_done] 0-gfs34-replicate-1: proceeding to data check on /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po [2013-07-22 15:36:23.278390] D [afr-self-heal-data.c:1158:afr_sh_data_post_nonblocking_inodelk_cbk] 0-gfs34-replicate-1: Non Blocking data inodelks done for /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po by 5c3e47ba. Proceeding to self-heal [2013-07-22 15:36:23.278520] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool is full. Callocing mem [2013-07-22 15:36:23.278540] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool is full. Callocing mem [2013-07-22 15:36:23.280422] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool is full. Callocing mem [2013-07-22 15:36:23.281824] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool is full. Callocing mem [2013-07-22 15:36:23.282746] D [afr-self-heal-data.c:686:afr_sh_data_fxattrop_fstat_done] 0-gfs34-replicate-1: Pending matrix for: 5c3e47ba [2013-07-22 15:36:23.282798] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ] [2013-07-22 15:36:23.282831] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ] [2013-07-22 15:36:23.282862] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gfs34-replicate-1: Number of sources: -1 [2013-07-22 15:36:23.282897] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-gfs34-replicate-1: Unable to self-heal contents of '/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 0 ] [ 0 0 ] ] [2013-07-22 15:36:23.282931] D [afr-self-heal-data.c:336:afr_sh_data_fail] 0-gfs34-replicate-1: finishing failed data selfheal of /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po [2013-07-22 15:36:23.282962] D [afr-lk-common.c:452:transaction_lk_op] 0-gfs34-replicate-1: lk op is for a self heal [2013-07-22 15:36:23.283575] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gfs34-replicate-1: background data self-heal failed on /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po [2013-07-22 15:36:23.283636] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ] [2013-07-22 15:36:23.283669] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ] [2013-07-22 15:36:23.283700] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gfs34-replicate-1: Number of sources: -1 [2013-07-22 15:36:23.283730] D [afr-self-heal-data.c:794:afr_lookup_select_read_child_by_txn_type] 0-gfs34-replicate-1: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po: Possible split-brain [2013-07-22 15:36:23.283763] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-gfs34-replicate-1: returning read_child: 1 [2013-07-22 15:36:23.283794] D [afr-common.c:1380:afr_lookup_select_read_child] 0-gfs34-replicate-1: Source selected as 1 for /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po [2013-07-22 15:36:23.283828] D [afr-common.c:1117:afr_lookup_build_response_params] 0-gfs34-replicate-1: Building lookup response from 1 [2013-07-22 15:36:23.284755] W [afr-open.c:213:afr_open] 0-gfs34-replicate-1: failed to open as split brain seen, returning EIO -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@xxxxxxxxxx