Hi all, I have a 4-server system running a distributed-replicate setup, 4 x (2 + 1) = 12. Bricks are staggered across the servers. Sharding is enabled. (v info shown below) Now, the storage is slow on the these servers and not really up to the job so we have 4 new servers with SSDs. I have to move everything over to the new servers whilst not taking down the storage. The four old servers are running Gluster 6.4 and the new ones, 6.5. So having read tons of docs and mailing lists, etc, I think I ought to be able to use add-brick, remove-brick to get everything moved safely like so: # gluster volume add-brick iscsi replica 3 arbiter 1 srv{13..15}:/brick1 # gluster volume remove-brick iscsi replica 3 srv{1..3}:/brick1 start Then once complete, do: # gluster volume remove-brick iscsi replica 3 srv{1..3}:/brick1 commit So I created a test volume to try this out. On the third add/remove of 4, I get a 'failed' on the remove-brick status. The rebalance log shows: [2020-02-28 22:25:28.133902] I [dht-rebalance.c:1589:dht_migrate_file] 0-testmig rate-dht: /linux-5.4.22/arch/arm/boot/dts/exynos4412-itop-scp-core.dtsi: attempt ing to move from testmigrate-replicate-0 to testmigrate-replicate-2 [2020-02-28 22:25:28.144258] W [MSGID: 108015] [afr-self-heal-name.c:138:__afr_s elfheal_name_expunge] 0-testmigrate-replicate-0: expunging file a75a83b7-2c34-40 77-b4fc-3126a9d6058a/exynos4210-smdkv310.dts (11a47b1f-2c24-4d4b-9402-9130125cf9 53) on testmigrate-client-6 [2020-02-28 22:25:28.146321] E [MSGID: 109023] [dht-rebalance.c:1707:dht_migrate_file] 0-testmigrate-dht: Migrate file failed:/linux-5.4.22/arch/arm/boot/dts/exynos4210-smdkv310.dts: lookup failed on testmigrate-replicate-0 [No such file or directory] [2020-02-28 22:25:28.149104] E [MSGID: 109023] [dht-rebalance.c:2874:gf_defrag_migrate_single_file] 0-testmigrate-dht: migrate-data failed for /linux-5.4.22/arch/arm/boot/dts/exynos4210-smdkv310.dts [No such file or directory] This is show for 4 files. When I look at the FUSE-mounted volume, the file is there and correct but the file permissions of this and lots of others are screwed. Lots of dirs with d--------- permissions, lots of root:root owned files. So any advice for how to proceed from here: I did a force on the remove-brick as the data seemed to be in place which is fine, but now I can't do an add-brick as gluster seems to think a rebalance is taking place: --- volume add-brick: failed: Pre Validation failed on terek-stor.amazing-internet.net. Volume name testmigrate rebalance is in progress. Please retry after completion --- $ sudo gluster volume rebalance testmigrate status volume rebalance: testmigrate: failed: Rebalance not started for volume testmigrate. Thanks for any insight anyone can offer. Ronny $ sudo gluster volume info iscsi Volume Name: iscsi Type: Distributed-Replicate Volume ID: 40ff42a7-5dee-4a98-991b-c4ba5bc50438 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1/brick Brick2: mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1/brick Brick3: terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1a/brick (arbiter) Brick4: walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2/brick Brick5: ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2/brick Brick6: mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2a/brick (arbiter) Brick7: terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3/brick Brick8: walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3/brick Brick9: ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3a/brick (arbiter) Brick10: mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4/brick Brick11: terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4/brick Brick12: walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4a/brick (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.open-behind: off performance.readdir-ahead: off performance.strict-o-direct: on network.remote-dio: disable cluster.eager-lock: enable cluster.quorum-type: auto cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on features.shard-block-size: 64MB user.cifs: off server.allow-insecure: on cluster.choose-local: off auth.allow: 127.0.0.1,172.16.36.*,172.16.40.* ssl.cipher-list: HIGH:!SSLv2 server.ssl: on client.ssl: on ssl.certificate-depth: 1 performance.cache-size: 1GB client.event-threads: 4 server.event-threads: 4 -- Ronny Adsetts Technical Director Amazing Internet Ltd, London t: +44 20 8977 8943 w: www.amazinginternet.com Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ Registered in England. Company No. 4042957
Attachment:
signature.asc
Description: OpenPGP digital signature
________ Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users