2016-08-11 13:08 GMT+02:00 Lindsay Mathieson <lindsay.mathieson@xxxxxxxxx>: > Also "gluster volume status" lists the pid's of all the bricks processes. Ok, let's break everything., just to try. This is a working cluster. I have 3 server with 1 brick each, in replica 3, thus, all files are replicated on all hosts. # gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 2a36dc0f-1d9b-469c-82de-9d8d98321b83 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 1.2.3.112:/export/sdb1/brick Brick2: 1.2.3.113:/export/sdb1/brick Brick3: 1.2.3.114:/export/sdb1/brick Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.shard: off features.shard-block-size: 10MB performance.write-behind-window-size: 1GB performance.cache-size: 1GB I did this on a client: # echo 'hello world' > hello # md5sum hello 6f5902ac237024bdd0c176cb93063dc4 hello Obviously, on node 1.2.3.112 I have it: # cat /export/sdb1/brick/hello hello world # md5sum /export/sdb1/brick/hello 6f5902ac237024bdd0c176cb93063dc4 /export/sdb1/brick/hello Let's break everything, this is funny. I take the brick pid from here: # gluster volume status | grep 112 Brick 1.2.3.112:/export/sdb1/brick 49152 0 Y 14315 # kill -9 14315 # gluster volume status | grep 112 Brick 1.2.3.112:/export/sdb1/brick N/A N/A N N/A this should be like a dregraded cluster, right ? Now I add a new file from the client: echo "hello world, i'm degraded" > degraded Obviously, this file is not replicated on node 1.2.3.112 # gluster volume heal gv0 info Brick 1.2.3.112:/export/sdb1/brick Status: Transport endpoint is not connected Number of entries: - Brick 1.2.3.113:/export/sdb1/brick /degraded / Status: Connected Number of entries: 2 Brick 1.2.3.114:/export/sdb1/brick /degraded / Status: Connected Number of entries: 2 This means that "/" dir and "/degraded" file should be healed from .113 and .114 ? Let's format the disk on .112 # umount /dev/sdb1 # mkfs.xfs /dev/sdb1 -f meta-data=/dev/sdb1 isize=256 agcount=4, agsize=122094597 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 data = bsize=4096 blocks=488378385, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=238466, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Now I mount it again on the old place: # mount /dev/sdb1 /export/sdb1 it's empty: # ls /export/sdb1/ -la total 4 drwxr-xr-x 2 root root 6 Aug 11 15:37 . drwxr-xr-x 3 root root 4096 Jul 5 17:03 .. I create the "brick" directory used by gluster: # mkdir /export/sdb1/brick Now I run the volume start force: # gluster volume start gv0 force volume start: gv0: success But brick process is still down: # gluster volume status | grep 112 Brick 1.2.3.112:/export/sdb1/brick N/A N/A N N/A And now ? What I really don't like is the use of "force" in "gluster volume start" Usually (in all software) force is used when "bad things" are needed. In this case, the volume start is mandatory, thus why I have to use the force? If the volume is already started, gluster should be smart enough to start only the missing processes, without force, or, better, another command should be created, something like: "gluster bricks start" using the force means running dangerous operation, not a common administrative task. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users