Re: distribute remove-brick has started migrating the wrong brick (glusterfs 3.8.13)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nithya,

You are correct, but as you stated earlier, it also has to migrate data from other bricks on the same host, so another 74TB on dc4-03 /dev/md0 needs to be migrated?

> This is the current behaviour of rebalance and nothing to be concerned about - it will migrate data on all bricks on the nodes which host the bricks being removed


Steve

On Tue, 18 Dec 2018 at 15:37, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:


On Tue, 18 Dec 2018 at 14:56, Stephen Remde <stephen.remde@xxxxxxxxxxx> wrote:
Nithya,

I've realised, I will not have enough space on the other bricks in my cluster to migrate data off the server so I can remove the single brick - is there a work around?

As you can see below, the new brick was created with the wrong raid configuration, so I want to remove it recreate the raid, and re add it.
xxxxxx Filesystem      Size  Used Avail Use% Mounted on
dc4-01 /dev/md0         95T   87T  8.0T  92% /export/md0
dc4-01 /dev/md1         95T   87T  8.4T  92% /export/md1
dc4-01 /dev/md2         95T   86T  9.3T  91% /export/md2
dc4-01 /dev/md3         95T   86T  8.9T  91% /export/md3
dc4-02 /dev/md0         95T   89T  6.5T  94% /export/md0
dc4-02 /dev/md1         95T   87T  8.4T  92% /export/md1
dc4-02 /dev/md2         95T   87T  8.6T  91% /export/md2
dc4-02 /dev/md3         95T   86T  8.8T  91% /export/md3
dc4-03 /dev/md0         95T   74T   21T  78% /export/md0
dc4-03 /dev/md1        102T  519G  102T   1% /export/md1

I believe this is the brick being removed - the one that has about 519G of data? If I have understood the scenario properly, there seems to be plenty of free space on the other bricks (most seem to have terabytes free) . Is there something I am missing ?

Regards,
Nithya
 
This is the backup storage, so if I HAVE to lose the 519GB and resync, that's an acceptable worst-case.

gluster> v info video-backup
 
Volume Name: video-backup
Type: Distribute
Volume ID: 887bdc2a-ca5e-4ca2-b30d-86831839ed04
Status: Started
Snapshot Count: 0
Number of Bricks: 10
Transport-type: tcp
Bricks:
Brick1: 10.0.0.41:/export/md0/brick
Brick2: 10.0.0.42:/export/md0/brick
Brick3: 10.0.0.43:/export/md0/brick
Brick4: 10.0.0.41:/export/md1/brick
Brick5: 10.0.0.42:/export/md1/brick
Brick6: 10.0.0.41:/export/md2/brick
Brick7: 10.0.0.42:/export/md2/brick
Brick8: 10.0.0.41:/export/md3/brick
Brick9: 10.0.0.42:/export/md3/brick
Brick10: 10.0.0.43:/export/md1/brick
Options Reconfigured:
cluster.rebal-throttle: aggressive
cluster.min-free-disk: 1%
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

Best,
Steve

On Wed, 12 Dec 2018 at 03:07, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:

This is the current behaviour of rebalance and nothing to be concerned about - it will migrate data on all bricks on the nodes which host the bricks being removed. The data on the removed bricks will be moved to other bricks, some of the  data on the other bricks on the node will just be moved to other bricks based on the new directory layouts. 
I will fix this in the near future but you don't need to to stop the remove-brick operation.

Regards,
Nithya

On Wed, 12 Dec 2018 at 06:36, Stephen Remde <stephen.remde@xxxxxxxxxxx> wrote:
I requested a brick be removed from a distribute only volume and it seems to be migrating data from the wrong brick... unless I am reading this wrong which I doubt because the disk usage is definitely decreasing on the wrong brick.

gluster> volume status
Status of volume: video-backup
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.0.0.41:/export/md0/brick           49172     0          Y       5306 
Brick 10.0.0.42:/export/md0/brick           49172     0          Y       3651 
Brick 10.0.0.43:/export/md0/brick           49155     0          Y       2826 
Brick 10.0.0.41:/export/md1/brick           49173     0          Y       5311 
Brick 10.0.0.42:/export/md1/brick           49173     0          Y       3656 
Brick 10.0.0.41:/export/md2/brick           49174     0          Y       5316 
Brick 10.0.0.42:/export/md2/brick           49174     0          Y       3662 
Brick 10.0.0.41:/export/md3/brick           49175     0          Y       5322 
Brick 10.0.0.42:/export/md3/brick           49175     0          Y       3667 
Brick 10.0.0.43:/export/md1/brick           49156     0          Y       4836 
 
Task Status of Volume video-backup
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 7895be7c-4ab9-440d-a301-c11dae0dd9e1
Status               : completed           
 
gluster> volume remove-brick video-backup 10.0.0.43:/export/md1/brick start
volume remove-brick start: success
ID: f666a196-03c2-4940-bd38-45d8383345a4

gluster> volume status 
Status of volume: video-backup
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.0.0.41:/export/md0/brick           49172     0          Y       5306 
Brick 10.0.0.42:/export/md0/brick           49172     0          Y       3651 
Brick 10.0.0.43:/export/md0/brick           49155     0          Y       2826 
Brick 10.0.0.41:/export/md1/brick           49173     0          Y       5311 
Brick 10.0.0.42:/export/md1/brick           49173     0          Y       3656 
Brick 10.0.0.41:/export/md2/brick           49174     0          Y       5316 
Brick 10.0.0.42:/export/md2/brick           49174     0          Y       3662 
Brick 10.0.0.41:/export/md3/brick           49175     0          Y       5322 
Brick 10.0.0.42:/export/md3/brick           49175     0          Y       3667 
Brick 10.0.0.43:/export/md1/brick           49156     0          Y       4836 
 
Task Status of Volume video-backup
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : f666a196-03c2-4940-bd38-45d8383345a4
Removed bricks:     
10.0.0.43:/export/md1/brick
Status               : in progress         


But when I check the rebalance log on the host with the brick being removed, it is actually migrating data from the other brick on the same host 10.0.0.43:/export/md0/brick 


.....
[2018-12-11 11:59:52.572657] I [MSGID: 109086] [dht-shared.c:297:dht_parse_decommissioned_bricks] 0-video-backup-dht: decommissioning subvolume video-backup-client-9
....
 29: volume video-backup-client-2
 30:     type protocol/client
 31:     option clnt-lk-version 1
 32:     option volfile-checksum 0
 33:     option volfile-key rebalance/video-backup
 34:     option client-version 3.8.15
 35:     option process-uuid node-dc4-03-25536-2018/12/11-11:59:47:551328-video-backup-client-2-0-0
 36:     option fops-version 1298437
 37:     option ping-timeout 42
 38:     option remote-host 10.0.0.43
 39:     option remote-subvolume /export/md0/brick
 40:     option transport-type socket
 41:     option transport.address-family inet
 42:     option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7
 43:     option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2
 44: end-volume
...
112: volume video-backup-client-9
113:     type protocol/client
114:     option ping-timeout 42
115:     option remote-host 10.0.0.43
116:     option remote-subvolume /export/md1/brick
117:     option transport-type socket
118:     option transport.address-family inet
119:     option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7
120:     option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2
121: end-volume
...
[2018-12-11 11:59:52.608698] I [dht-rebalance.c:3668:gf_defrag_start_crawl] 0-video-backup-dht: gf_defrag_start_crawl using commit hash 3766302106
[2018-12-11 11:59:52.609478] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /
[2018-12-11 11:59:52.615348] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-2
[2018-12-11 11:59:52.615378] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-9
...
[2018-12-11 11:59:52.616554] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /
[2018-12-11 11:59:54.000363] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /symlinks.txt: attempting to move from video-backup-client-2 to video-backup-client-4
[2018-12-11 11:59:55.110549] I [MSGID: 109022] [dht-rebalance.c:1703:dht_migrate_file] 0-video-backup-dht: completed migration of /symlinks.txt from subvolume video-backup-client-2 to video-backup-client-4
[2018-12-11 11:59:58.100931] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6
[2018-12-11 11:59:58.107389] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6
[2018-12-11 11:59:58.132138] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6 took 0.02 secs
[2018-12-11 11:59:58.330393] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6/2017
[2018-12-11 11:59:58.337601] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6/2017
[2018-12-11 11:59:58.493906] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908101048: attempting to move from video-backup-client-2 to video-backup-client-4
[2018-12-11 11:59:58.706068] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908120734132317: attempting to move from video-backup-client-2 to video-backup-client-4
[2018-12-11 11:59:58.783952] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124091841: attempting to move from video-backup-client-2 to video-backup-client-4
[2018-12-11 11:59:58.843315] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124135453: attempting to move from video-backup-client-2 to video-backup-client-4
[2018-12-11 11:59:58.951637] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161122111252: attempting to move from video-backup-client-2 to video-backup-client-4
[2018-12-11 11:59:59.005324] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6/2017 took 0.67 secs
[2018-12-11 11:59:59.005362] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/58906aaaaca0515f5994104d20170213154555: attempting to move from video-backup-client-2 to video-backup-client-4

etc...

Can I stop/cancel it without data loss? How can I make gluster remove the correct brick? 

Thanks
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users




--

Dr Stephen Remde
Director, Innovation and Research


T: 01535 280066
M: 07764 740920
E: stephen.remde@xxxxxxxxxxx
W: www.gaist.co.uk

Attachment: image001.jpg
Description: JPEG image

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux