Physical HDD unplug test

qingwei wei <tchengwee@xxxxxxxxx> · Tue, 16 Aug 2016 13:34:36 +0800

Hi,

I am currently trying to test the distributed replica (3 replicas)
reliability when 1 brick is down. I tried using both software unplug
method by issuing the exho offline > /sys/block/sdx/device/state and
also physically unplug the HDD and i encountered 2 different outcomes.
For software unplug, the FIO workload continue to run but for
physically unplug the HDD, FIO workload cannot continue with the
following error:

[2016-08-12 10:33:41.854283] E [MSGID: 108008]
[afr-transaction.c:1989:afr_transaction] 0-ad17hwssd7-replicate-0:
Failing WRITE on gfid 665a43df-1ece-4c9a-a6ee-fcfa960d95bf:
split-brain observed. [Input/output error]

>From the server where i unplug the disk, i can see the following:

[2016-08-12 10:33:41.916456] D [MSGID: 0]
[io-threads.c:351:iot_schedule] 0-ad17hwssd7-io-threads: LOOKUP
scheduled as fast fop
[2016-08-12 10:33:41.916666] D [MSGID: 115050]
[server-rpc-fops.c:179:server_lookup_cbk] 0-ad17hwssd7-server: 8127:
LOOKUP /.shard/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90
(be318638-e8a0-4c6d-977d-7a937aa84806/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90)
==> (No such file or directory) [No such file or directory]
[2016-08-12 10:33:41.916804] D [MSGID: 101171]
[client_t.c:417:gf_client_unref] 0-client_t:
hp.dctopenstack.org-25780-2016/08/12-10:33:07:589960-ad17hwssd7-client-0-0-0:
ref-count 1
[2016-08-12 10:33:41.917098] D [MSGID: 101171]
[client_t.c:333:gf_client_ref] 0-client_t:
hp.dctopenstack.org-25780-2016/08/12-10:33:07:589960-ad17hwssd7-client-0-0-0:
ref-count 2
[2016-08-12 10:33:41.917145] W [MSGID: 115009]
[server-resolve.c:571:server_resolve] 0-ad17hwssd7-server: no
resolution type for (null) (LOOKUP)
[2016-08-12 10:33:41.917182] E [MSGID: 115050]
[server-rpc-fops.c:179:server_lookup_cbk] 0-ad17hwssd7-server: 8128:
LOOKUP (null) (00000000-0000-0000-0000-000000000000/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90)
==> (Invalid argument) [Invalid argument]

I am using gluster 3.7.10 and the configuration is as follow:

diagnostics.brick-log-level: DEBUG
diagnostics.client-log-level: DEBUG
performance.io-thread-count: 16
client.event-threads: 2
server.event-threads: 2
features.shard-block-size: 16MB
features.shard: on
server.allow-insecure: on
storage.owner-uid: 165
storage.owner-gid: 165
nfs.disable: true
performance.quick-read: off
performance.io-cache: off
performance.read-ahead: off
performance.stat-prefetch: off
cluster.lookup-optimize: on
cluster.quorum-type: auto
cluster.server-quorum-type: server
transport.address-family: inet
performance.readdir-ahead: on

This error only occur for sharding configuration. Do you guys perform
this type of test before? Or do you think physically unplug the HDD is
a valid test case?

Thanks.

Cw
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel