crash test result: Input/output error

tapczan <tapczan@xxxxxx> · Wed, 08 Feb 2012 13:33:45 +0100

This is my crash test scenerio:

1. hosts
server1 - member of gluster volume
server2 - member of gluster volume
client1 - gluster storage activity - reads: ~10/s, writes: ~10/s
client2 - gluster storage activity - reads: ~10/s, writes: ~10/s

2. AFR gluster storage (tested 3.2.5 and 3.3beta2)
# gluster volume info

Volume Name: data
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: server1:/fs/data
Brick2: server2:/fs/data

3. storage /fs/data:
~ 300 000 files (size < 10KB)
~ 3 GB

4. crash test scenerio
- server1 goes down
- clients got a few "Input/output error" (for read and write) and 
continue working - fine
- server1 recovers (after ~3 minutes)
- clients got a few "Input/output error" (for read and write) - fine
- access to gluster storage from clients blocked (self-healing process - 
a few minutes with my hardware configuration)
- during this self-heling process server2 goes down
- self-healing process interrupted and clients gain access to gluster data
- server2 recovers and real problems started
- clients: data inaccessible: permanent "Input/output error" for files 
and directories

client1:
# ls -la a
ls: cannot access a: Input/output error

# ls -la
??????????  ? ?    ?         ?            ? a

server1:
# getfattr -d -m . a
# file: a
trusted.afr.data2-client-0=0sAAAAAAAAAAAAAAAA
trusted.afr.data2-client-1=0sAAAAAAAAAAAAAAAq
trusted.gfid=0sfdlzd6TeRxelnMeCG9ut/w==

server2:
# getfattr -d -m . a
# file: a
trusted.afr.data2-client-0=0sAAAAAAAAAAAAAAA1
trusted.afr.data2-client-1=0sAAAAAAAAAAAAAAAA
trusted.gfid=0sfdlzd6TeRxelnMeCG9ut/w==

clients /var/log/glusterfs/data.log:
[2012-02-08 13:24:16.837976] I 
[afr-self-heal-common.c:705:afr_mark_sources] 0-data2-replicate-0: 
split-brain possible, no source detected
[2012-02-08 13:24:16.838079] W [fuse-bridge.c:184:fuse_entry_cbk] 
0-glusterfs-fuse: 565416: LOOKUP() /a => -1 (Input/output error)

This kind of issues making gluster unusable in production system.

--
Robert