crash test result: Input/output error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is my crash test scenerio:

1. hosts
server1 - member of gluster volume
server2 - member of gluster volume
client1 - gluster storage activity - reads: ~10/s, writes: ~10/s
client2 - gluster storage activity - reads: ~10/s, writes: ~10/s

2. AFR gluster storage (tested 3.2.5 and 3.3beta2)
# gluster volume info

Volume Name: data
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: server1:/fs/data
Brick2: server2:/fs/data

3. storage /fs/data:
~ 300 000 files (size < 10KB)
~ 3 GB

4. crash test scenerio
- server1 goes down
- clients got a few "Input/output error" (for read and write) and continue working - fine
- server1 recovers (after ~3 minutes)
- clients got a few "Input/output error" (for read and write) - fine
- access to gluster storage from clients blocked (self-healing process - a few minutes with my hardware configuration)
- during this self-heling process server2 goes down
- self-healing process interrupted and clients gain access to gluster data
- server2 recovers and real problems started
- clients: data inaccessible: permanent "Input/output error" for files and directories

client1:
# ls -la a
ls: cannot access a: Input/output error

# ls -la
??????????  ? ?    ?         ?            ? a

server1:
# getfattr -d -m . a
# file: a
trusted.afr.data2-client-0=0sAAAAAAAAAAAAAAAA
trusted.afr.data2-client-1=0sAAAAAAAAAAAAAAAq
trusted.gfid=0sfdlzd6TeRxelnMeCG9ut/w==

server2:
# getfattr -d -m . a
# file: a
trusted.afr.data2-client-0=0sAAAAAAAAAAAAAAA1
trusted.afr.data2-client-1=0sAAAAAAAAAAAAAAAA
trusted.gfid=0sfdlzd6TeRxelnMeCG9ut/w==

clients /var/log/glusterfs/data.log:
[2012-02-08 13:24:16.837976] I [afr-self-heal-common.c:705:afr_mark_sources] 0-data2-replicate-0: split-brain possible, no source detected [2012-02-08 13:24:16.838079] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 565416: LOOKUP() /a => -1 (Input/output error)


This kind of issues making gluster unusable in production system.

--
Robert



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux