Corrupted brick/volume after server crash

bjoern.baer at bbpcs.de (Björn Bär) · Fri, 9 Aug 2013 11:27:18 +0200

Hi GlusterFS Users,

I am evaluating GlusterFS as a Cloud Storage for a new project. Creating and using GlsuterFS is really easy and stable.
But when it comes to a (simulated) server crash, I am always getting the following strange results. 
In fact, I could not setup one scenario where GlusterFS  works as (I) expected.

##
##	Node OS => Ubuntu 12.04 LTS 64 Bit
##	Node HDD => 4x 1TB in one Single LVM Volume (No Raid)
##	GlusterFS => Gluster 3.2.5 from Ubuntu Default Repo
##

## 1. Peer Probe 
root at cl0001nd0001:~# gluster peer probe 10.0.0.2
root at cl0001nd0001:~# gluster peer probe 10.0.0.4
root at cl0001nd0001:~# gluster peer probe 10.0.0.5 

## 2. Create and start volume
root at cl0001nd0001:~# gluster volume create testvol replica 2 transport tcp 10.0.0.1:/media/gfsbrick 10.0.0.2:/media/gfsbrick
root at cl0001nd0001:~# gluster volume start testvol
root at cl0001nd0001:~# gluster volume add-brick testvol 10.0.0.4:/media/gfsbrick 10.0.0.5:/media/gfsbrick

## 3. Assume node cl0001nd0005 crashed and has been replaced. (new hardware same hostname and ip)
root at cl0001nd0005:~# service glusterfs-server stop

Copy UUID from old glusterfs installation to new one

root at cl0001nd0005:~# service glusterfs-server start
root at cl0001nd0005:~# gluster peer probe 10.0.0.1
Probe successful
root at cl0001nd0005:~# service glusterfs-server restart
glusterfs-server stop/waiting
glusterfs-server start/running, process 3003
root at cl0001nd0005:~# gluster peer status
Number of Peers: 3

Hostname: 10.0.0.1
Uuid: 707eebff-42dc-4829-99b5-9df4a42a1547
State: Peer in Cluster (Connected)

Hostname: 10.0.0.2
Uuid: caece58d-4ad7-4666-8a40-e7d0f0354e8e
State: Peer in Cluster (Connected)

Hostname: 10.0.0.4
Uuid: c2a9aad9-6c29-43b3-880d-c835c089b3ef
State: Peer in Cluster (Connected)
root at cl0001nd0005:~# gluster volume info                                                             

Volume Name: testvol                                                                                                    
Type: Distributed-Replicate                                                                                          
Status: Started                                                                                                                 
Number of Bricks: 2 x 2 = 4                                                                                           
Transport-type: tcp                                                                                                          
Bricks:                                                                                                                               
Brick1: 10.0.0.1:/media/gfsbrick                                                                                   
Brick2: 10.0.0.2:/media/gfsbrick                                                                                   
Brick3: 10.0.0.5:/media/gfsbrick                                                                                   
Brick4: 10.0.0.4:/media/gfsbrick

root at cl0001nd0005:~# service glusterfs-server restart                                         
glusterfs-server stop/waiting                                                                                         
glusterfs-server start/running, process 3084
root at cl0001nd0005:~# cd /media/gfsbrick/                                                              
root at cl0001nd0005:/media/gfsbrick# ll                                                                    
insgesamt 8                                                                                                                     
drwxr-xr-x 2 root root 4096 Aug  9 09:52 ./                                                                 
drwxr-xr-x 3 root root 4096 Aug  9 09:52 ../

## 4. Trigger self-heal 
root at cl0001nd0001:/mnt# find /mnt -noleaf -print0

## 5. No content in node cl0001nd0005 brick
root at cl0001nd0005:/media/gfsbrick# ll
insgesamt 8
drwxr-xr-x 2 root root 4096 Aug  9 09:52 ./
drwxr-xr-x 3 root root 4096 Aug  9 09:52 ../

## 6. Starting rebalance
root at cl0001nd0005:/media/gfsbrick# gluster volume rebalance testvol start
starting rebalance on volume testvol has been successful
root at cl0001nd0005:/media/gfsbrick# gluster volume rebalance testvol status
rebalance completed

## 7. Listing content of brick on node cl0001nd0005 ==> Corrupted data
root at cl0001nd0005:/media/gfsbrick# ll
insgesamt 242660
drwxr-xr-x 2 root root     4096 Aug  8 15:22 ./
drwxr-xr-x 3 root root     4096 Aug  9 09:52 ../
-rw-r--r-- 1 root root 56623104 Aug  9 10:02 datei1
-rw-r--r-- 1 root root 26214400 Aug  9 10:02 datei4
-rw-r--r-- 1 root root 15728640 Aug  9 10:02 datei6
-rw-r--r-- 1 root root 36700160 Aug  9 10:02 datei7
-rw-r--r-- 1 root root 66650112 Aug  9 10:02 datei8
-rw-r--r-- 1 root root 46530560 Aug  9 10:02 datei9

## 8. Trigging self-heal again 
root at cl0001nd0001:/mnt# find /mnt -noleaf -print0

## 9. No changes in brick on node 
root at cl0001nd0005:/media/gfsbrick# ll
insgesamt 242660
drwxr-xr-x 2 root root     4096 Aug  8 15:22 ./
drwxr-xr-x 3 root root     4096 Aug  9 09:52 ../
-rw-r--r-- 1 root root 56623104 Aug  9 10:02 datei1
-rw-r--r-- 1 root root 26214400 Aug  9 10:02 datei4
-rw-r--r-- 1 root root 15728640 Aug  9 10:02 datei6
-rw-r--r-- 1 root root 36700160 Aug  9 10:02 datei7
-rw-r--r-- 1 root root 66650112 Aug  9 10:02 datei8
-rw-r--r-- 1 root root 46530560 Aug  9 10:02 datei9

## 10. Mounted volume shows correct file sizes
root at cl0001nd0001:/mnt# ll                                                                                        
insgesamt 921664                                                                                                          
drwxr-xr-x  2 root root      8192 Aug  9 09:57 ./                                                           
drwxr-xr-x 23 root root      4096 Aug  8 14:39 ../                                                        
-rw-r--r--  1 root root 104857600 Aug  8 14:54 datei1                                              
-rw-r--r--  1 root root 104857600 Aug  8 14:54 datei2                                              
-rw-r--r--  1 root root 104857600 Aug  8 14:54 datei3                                              
-rw-r--r--  1 root root 104857600 Aug  8 14:55 datei4                                              
-rw-r--r--  1 root root 104857600 Aug  8 14:55 datei5                                              
-rw-r--r--  1 root root 104857600 Aug  8 14:55 datei6                                              
-rw-r--r--  1 root root 104857600 Aug  8 14:55 datei7                                              
-rw-r--r--  1 root root 104857600 Aug  8 15:24 datei8                                              
-rw-r--r--  1 root root 104857600 Aug  8 15:24 datei9                                              
-rw-r--r--  1 root root       417 Aug  8 17:27 peeruuid

## 11. Starting rebalnce again
root at cl0001nd0004:~# gluster volume rebalance testvol start
starting rebalance on volume testvol has been successful
root at cl0001nd0004:~# gluster volume rebalance testvol status
rebalance step 1: layout fix in progress
root at cl0001nd0004:~# gluster volume rebalance testvol status
rebalance completed

## 12. Triggering self heal again
root at cl0001nd0001:/mnt# find /mnt -noleaf -print0

## 13. No changes in brick on replaced node. Still corrupted data
root at cl0001nd0005:/media/gfsbrick# ll
insgesamt 242660
drwxr-xr-x 2 root root     4096 Aug  8 15:22 ./
drwxr-xr-x 3 root root     4096 Aug  9 09:52 ../
-rw-r--r-- 1 root root 56623104 Aug  9 10:02 datei1
-rw-r--r-- 1 root root 26214400 Aug  9 10:02 datei4
-rw-r--r-- 1 root root 15728640 Aug  9 10:02 datei6
-rw-r--r-- 1 root root 36700160 Aug  9 10:02 datei7
-rw-r--r-- 1 root root 66650112 Aug  9 10:02 datei8
-rw-r--r-- 1 root root 46530560 Aug  9 10:02 datei9

Why have the files in the replaced brick a different size?
Is this the correct / default behavior of GlusterFS?
Am I doing something wrong? 

Thanks for your answers!

Regards,
Bj?rn

-- 
b j o e r n    b a e r