On 11/06/2013 10:53 AM, B.K.Raghuram wrote: > Here are the steps that I did to reproduce the problem. Essentially, > if you try to remove a brick that is not the same as the localhost > then it seems to migrate the files on the localhost brick instead and > hence there is a lot of data loss.. If instead, I try to remove the > localhost brick, it works fine. Can we try and get this fix into 3.4.2 > as this seems to be the only way to replace a brick, given that > replace-brick is being removed! > > [root at s5n9 ~]# gluster volume create v1 transport tcp > s5n9.testing.lan:/data/v1 s5n10.testing.lan:/data/v1 > volume create: v1: success: please start the volume to access data > [root at s5n9 ~]# gluster volume start v1 > volume start: v1: success > [root at s5n9 ~]# gluster volume info v1 > > Volume Name: v1 > Type: Distribute > Volume ID: 6402b139-2957-4d62-810b-b70e6f9ba922 > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: s5n9.testing.lan:/data/v1 > Brick2: s5n10.testing.lan:/data/v1 > > ***********Now NFS mounted the volume onto my laptop and with a script > created 300 files in the mount. Distribution results below ********** > [root at s5n9 ~]# ls -l /data/v1 | wc -l > 160 > [root at s5n10 ~]# ls -l /data/v1 | wc -l > 142 > > [root at s5n9 ~]# gluster volume add-brick v1 s6n11.testing.lan:/data/v1 > volume add-brick: success > [root at s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 start > volume remove-brick start: success > ID: 8f3c37d6-2f24-4418-b75a-751dcb6f2b98 > [root at s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 status > Node Rebalanced-files > size scanned failures skipped status run-time > in secs > --------- ----------- > ----------- ----------- ----------- ----------- ------------ > -------------- > localhost 0 > 0Bytes 0 0 not started 0.00 > s6n12.testing.lan 0 > 0Bytes 0 0 not started 0.00 > s6n11.testing.lan 0 > 0Bytes 0 0 not started 0.00 > s5n10.testing.lan 0 > 0Bytes 300 0 completed 1.00 > > > [root at s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 commit > Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y > volume remove-brick commit: success > > [root at s5n9 ~]# gluster volume info v1 > > Volume Name: v1 > Type: Distribute > Volume ID: 6402b139-2957-4d62-810b-b70e6f9ba922 > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: s5n9.testing.lan:/data/v1 > Brick2: s6n11.testing.lan:/data/v1 > > > [root at s5n9 ~]# ls -l /data/v1 | wc -l > 160 > [root at s5n10 ~]# ls -l /data/v1 | wc -l > 142 > [root at s6n11 ~]# ls -l /data/v1 | wc -l > 160 > [root at s5n9 ~]# ls /data/v1 > file10 file110 file131 file144 file156 file173 file19 file206 > file224 file238 file250 file264 file279 file291 file31 file44 > file62 file86 > file100 file114 file132 file146 file159 file174 file192 file209 > file225 file24 file252 file265 file28 file292 file32 file46 > file63 file87 > file101 file116 file134 file147 file16 file18 file196 file210 > file228 file240 file254 file266 file281 file293 file37 file47 > file66 file9 > file102 file12 file135 file148 file161 file181 file198 file212 > file229 file241 file255 file267 file284 file294 file38 file48 > file69 file91 > file103 file121 file136 file149 file165 file183 file200 file215 > file231 file243 file256 file268 file285 file295 file4 file50 > file7 file93 > file104 file122 file137 file150 file17 file184 file201 file216 > file233 file245 file258 file271 file286 file296 file40 file53 > file71 file97 > file105 file124 file138 file152 file170 file186 file202 file218 > file234 file246 file261 file273 file287 file297 file41 file54 > file73 > file107 file125 file140 file153 file171 file188 file203 file220 > file236 file248 file262 file275 file288 file298 file42 file55 > file75 > file11 file13 file141 file154 file172 file189 file204 file222 > file237 file25 file263 file278 file290 file3 file43 file58 > file80 > > [root at s6n11 ~]# ls /data/v1 > file10 file110 file131 file144 file156 file173 file19 file206 > file224 file238 file250 file264 file279 file291 file31 file44 > file62 file86 > file100 file114 file132 file146 file159 file174 file192 file209 > file225 file24 file252 file265 file28 file292 file32 file46 > file63 file87 > file101 file116 file134 file147 file16 file18 file196 file210 > file228 file240 file254 file266 file281 file293 file37 file47 > file66 file9 > file102 file12 file135 file148 file161 file181 file198 file212 > file229 file241 file255 file267 file284 file294 file38 file48 > file69 file91 > file103 file121 file136 file149 file165 file183 file200 file215 > file231 file243 file256 file268 file285 file295 file4 file50 > file7 file93 > file104 file122 file137 file150 file17 file184 file201 file216 > file233 file245 file258 file271 file286 file296 file40 file53 > file71 file97 > file105 file124 file138 file152 file170 file186 file202 file218 > file234 file246 file261 file273 file287 file297 file41 file54 > file73 > file107 file125 file140 file153 file171 file188 file203 file220 > file236 file248 file262 file275 file288 file298 file42 file55 > file75 > file11 file13 file141 file154 file172 file189 file204 file222 > file237 file25 file263 file278 file290 file3 file43 file58 > file80 > > > ******* An ls of the mountpoint after this whole process only shows > 159 files - the ones that are on s5n9. So everything that was on s5n10 > is gone!! **** This matches the descirption in bug https://bugzilla.redhat.com/show_bug.cgi?id=1024369. Also in the bug comments, I can see it is confirmed that the issue is not there in upstream master. But we need to back-port the fix/fixes to 3.4 branch. -Lala