Hi, I am currently trying to test the distributed replica (3 replicas) reliability when 1 brick is down. I tried using both software unplug method by issuing the exho offline > /sys/block/sdx/device/state and also physically unplug the HDD and i encountered 2 different outcomes. For software unplug, the FIO workload continue to run but for physically unplug the HDD, FIO workload cannot continue with the following error: [2016-08-12 10:33:41.854283] E [MSGID: 108008] [afr-transaction.c:1989:afr_transaction] 0-ad17hwssd7-replicate-0: Failing WRITE on gfid 665a43df-1ece-4c9a-a6ee-fcfa960d95bf: split-brain observed. [Input/output error] >From the server where i unplug the disk, i can see the following: [2016-08-12 10:33:41.916456] D [MSGID: 0] [io-threads.c:351:iot_schedule] 0-ad17hwssd7-io-threads: LOOKUP scheduled as fast fop [2016-08-12 10:33:41.916666] D [MSGID: 115050] [server-rpc-fops.c:179:server_lookup_cbk] 0-ad17hwssd7-server: 8127: LOOKUP /.shard/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90 (be318638-e8a0-4c6d-977d-7a937aa84806/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90) ==> (No such file or directory) [No such file or directory] [2016-08-12 10:33:41.916804] D [MSGID: 101171] [client_t.c:417:gf_client_unref] 0-client_t: hp.dctopenstack.org-25780-2016/08/12-10:33:07:589960-ad17hwssd7-client-0-0-0: ref-count 1 [2016-08-12 10:33:41.917098] D [MSGID: 101171] [client_t.c:333:gf_client_ref] 0-client_t: hp.dctopenstack.org-25780-2016/08/12-10:33:07:589960-ad17hwssd7-client-0-0-0: ref-count 2 [2016-08-12 10:33:41.917145] W [MSGID: 115009] [server-resolve.c:571:server_resolve] 0-ad17hwssd7-server: no resolution type for (null) (LOOKUP) [2016-08-12 10:33:41.917182] E [MSGID: 115050] [server-rpc-fops.c:179:server_lookup_cbk] 0-ad17hwssd7-server: 8128: LOOKUP (null) (00000000-0000-0000-0000-000000000000/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90) ==> (Invalid argument) [Invalid argument] I am using gluster 3.7.10 and the configuration is as follow: diagnostics.brick-log-level: DEBUG diagnostics.client-log-level: DEBUG performance.io-thread-count: 16 client.event-threads: 2 server.event-threads: 2 features.shard-block-size: 16MB features.shard: on server.allow-insecure: on storage.owner-uid: 165 storage.owner-gid: 165 nfs.disable: true performance.quick-read: off performance.io-cache: off performance.read-ahead: off performance.stat-prefetch: off cluster.lookup-optimize: on cluster.quorum-type: auto cluster.server-quorum-type: server transport.address-family: inet performance.readdir-ahead: on This error only occur for sharding configuration. Do you guys perform this type of test before? Or do you think physically unplug the HDD is a valid test case? Thanks. Cw _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel