Re: Physical HDD unplug test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 16, 2016 at 01:34:36PM +0800, qingwei wei wrote:
> Hi,
> 
> I am currently trying to test the distributed replica (3 replicas)
> reliability when 1 brick is down. I tried using both software unplug
> method by issuing the exho offline > /sys/block/sdx/device/state and
> also physically unplug the HDD and i encountered 2 different outcomes.
> For software unplug, the FIO workload continue to run but for
> physically unplug the HDD, FIO workload cannot continue with the
> following error:
> 
> [2016-08-12 10:33:41.854283] E [MSGID: 108008]
> [afr-transaction.c:1989:afr_transaction] 0-ad17hwssd7-replicate-0:
> Failing WRITE on gfid 665a43df-1ece-4c9a-a6ee-fcfa960d95bf:
> split-brain observed. [Input/output error]
> 
> From the server where i unplug the disk, i can see the following:
> 
> [2016-08-12 10:33:41.916456] D [MSGID: 0]
> [io-threads.c:351:iot_schedule] 0-ad17hwssd7-io-threads: LOOKUP
> scheduled as fast fop
> [2016-08-12 10:33:41.916666] D [MSGID: 115050]
> [server-rpc-fops.c:179:server_lookup_cbk] 0-ad17hwssd7-server: 8127:
> LOOKUP /.shard/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90
> (be318638-e8a0-4c6d-977d-7a937aa84806/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90)
> ==> (No such file or directory) [No such file or directory]
> [2016-08-12 10:33:41.916804] D [MSGID: 101171]
> [client_t.c:417:gf_client_unref] 0-client_t:
> hp.dctopenstack.org-25780-2016/08/12-10:33:07:589960-ad17hwssd7-client-0-0-0:
> ref-count 1
> [2016-08-12 10:33:41.917098] D [MSGID: 101171]
> [client_t.c:333:gf_client_ref] 0-client_t:
> hp.dctopenstack.org-25780-2016/08/12-10:33:07:589960-ad17hwssd7-client-0-0-0:
> ref-count 2
> [2016-08-12 10:33:41.917145] W [MSGID: 115009]
> [server-resolve.c:571:server_resolve] 0-ad17hwssd7-server: no
> resolution type for (null) (LOOKUP)
> [2016-08-12 10:33:41.917182] E [MSGID: 115050]
> [server-rpc-fops.c:179:server_lookup_cbk] 0-ad17hwssd7-server: 8128:
> LOOKUP (null) (00000000-0000-0000-0000-000000000000/150e99ee-ce3b-4b57-8c40-99b4ecdf3822.90)
> ==> (Invalid argument) [Invalid argument]
> 
> I am using gluster 3.7.10 and the configuration is as follow:
> 
> diagnostics.brick-log-level: DEBUG
> diagnostics.client-log-level: DEBUG
> performance.io-thread-count: 16
> client.event-threads: 2
> server.event-threads: 2
> features.shard-block-size: 16MB
> features.shard: on
> server.allow-insecure: on
> storage.owner-uid: 165
> storage.owner-gid: 165
> nfs.disable: true
> performance.quick-read: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.stat-prefetch: off
> cluster.lookup-optimize: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> transport.address-family: inet
> performance.readdir-ahead: on
> 
> This error only occur for sharding configuration. Do you guys perform
> this type of test before? Or do you think physically unplug the HDD is
> a valid test case?

If you use replica-3, things should settle down again. The kernel and
teh brick process needs a little time to find out that the filesystem on
the disk that you pulled out is not responding anymore. The output og
"gluster volume status" should show that the brick process is offline.
As long as you have quorum, things should continue after a small delay
while waiting to mark the brick offline.

People actually should test this scenario, it can be that power to disks
fail, or even (connections to) RAID-controllers. Hot-unplugging is
definitely a scenario that can emulate real-world problems.

Niels

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux