On 06/18/2013 11:43 AM, elvinas.piliponis at barclays.com wrote: > Hello, > > When trying to recover from failed node and replace brick with spare one > I have trashed my cluster and now it is in stuck state. > > Any ideas, how to reintroduce/remove those nodes and bring peace and > order to cluster? > > There was a pending brick replacement operation from 0031 to 0028 (it is > still not commited according to rbstate file) > > There was a hardware failure on 0022 node > > I was not able to commit replace brick 0031 due to 0022 was not > responding and not giving cluster lock to requesting node. > > I was not able to start replacement 0022 to 0028 due to pending brick > replacement > > I have forced peer removal from cluster, hoping that afterwards I would > be able to complete operations. Unfortunately I have removes not only > 0022 but 0031 also. > > I have peer probed 0031 successfully. But now gluster volume info and > volume status both lists 0031 node. But when I attempt to do a brick > operation I do get: > > gluster volume remove-brick glustervmstore 0031:/mnt/vmstore/brick > 0036:/mnt/vmstore/brick force > > Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y > > Incorrect brick 0031:/mnt/vmstore/brick for volume glustervmstore > > gluster volume replace-brick glustervmstore 0031:/mnt/vmstore/brick > 0028:/mnt/vmstore/brick commit force > > brick: 0031:/mnt/vmstore/brick does not exist in volume: glustervmstore Looks like these commands are being rejected from a node where the volume information is not current. Can you please provide glusterd logs from the node where these commands were issued? Thanks, Vijay