Hi Sunny, Where would I find the changes-<brick-path>.log files? Is there anything else to help diagnose this? Thanks, -Matthew -- On 7/29/19 9:46 AM, Matthew Benstead
wrote:
Hi Sunny, Yes, I have attached the gsyncd.log file. I couldn't find any changes-<brick-path>.log files... Trying to start replication goes faulty right away: [root@gluster01 ~]# rpm -q glusterfs glusterfs-5.6-1.el7.x86_64 [root@gluster01 ~]# uname -r 3.10.0-957.21.3.el7.x86_64 [root@gluster01 ~]# cat /etc/centos-release CentOS Linux release 7.6.1810 (Core) [root@gluster01 ~]# gluster volume geo-replication storage root@10.0.231.81::pcic-backup start Starting geo-replication session between storage & 10.0.231.81::pcic-backup has been successful [root@gluster01 ~]# gluster volume geo-replication storage root@10.0.231.81::pcic-backup status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------- 10.0.231.50 storage /mnt/raid6-storage/storage root 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.52 storage /mnt/raid6-storage/storage root 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.54 storage /mnt/raid6-storage/storage root 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.51 storage /mnt/raid6-storage/storage root 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.53 storage /mnt/raid6-storage/storage root 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.55 storage /mnt/raid6-storage/storage root 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.56 storage /mnt/raid6-storage/storage root 10.0.231.81::pcic-backup N/A Faulty N/A N/A [root@gluster01 ~]# gluster volume geo-replication storage root@10.0.231.81::pcic-backup stop Stopping geo-replication session between storage & 10.0.231.81::pcic-backup has been successful This is the primary cluster: [root@gluster01 ~]# gluster volume info storage Volume Name: storage Type: Distribute Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2 Status: Started Snapshot Count: 0 Number of Bricks: 7 Transport-type: tcp Bricks: Brick1: 10.0.231.50:/mnt/raid6-storage/storage Brick2: 10.0.231.51:/mnt/raid6-storage/storage Brick3: 10.0.231.52:/mnt/raid6-storage/storage Brick4: 10.0.231.53:/mnt/raid6-storage/storage Brick5: 10.0.231.54:/mnt/raid6-storage/storage Brick6: 10.0.231.55:/mnt/raid6-storage/storage Brick7: 10.0.231.56:/mnt/raid6-storage/storage Options Reconfigured: features.read-only: off features.inode-quota: on features.quota: on performance.readdir-ahead: on nfs.disable: on geo-replication.indexing: on geo-replication.ignore-pid-check: on transport.address-family: inet features.quota-deem-statfs: on changelog.changelog: on diagnostics.client-log-level: INFO And this is the cluster I'm trying to replicate to: [root@pcic-backup01 ~]# gluster volume info pcic-backup Volume Name: pcic-backup Type: Distribute Volume ID: 2890bcde-a023-4feb-a0e5-e8ef8f337d4c Status: Started Snapshot Count: 0 Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.0.231.81:/pcic-backup01-zpool/brick Brick2: 10.0.231.82:/pcic-backup02-zpool/brick Options Reconfigured: nfs.disable: on transport.address-family: inet Thanks, -Matthew On 7/28/19 10:56 PM, Sunny Kumar wrote:HI Matthew, Can you share geo-rep logs and one more log file (changes-<brick-path>.log) it will help to pinpoint actual reason behind failure. /sunny On Mon, Jul 29, 2019 at 9:13 AM Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:On Sat, 27 Jul 2019 at 02:31, Matthew Benstead <matthewb@xxxxxxx> wrote:Ok thank-you for explaining everything - that makes sense. Currently the brick file systems are pretty evenly distributed so I probably won't run the fix-layout right now. Would this state have any impact on geo-replication? I'm trying to geo-replicate this volume, but am getting a weird error: "Changelog register failed error=[Errno 21] Is a directory"It should not. Sunny, can you comment on this? Regards, NithyaI assume this is related to something else, but I wasn't sure. Thanks, -Matthew -- Matthew Benstead System Administrator Pacific Climate Impacts Consortium University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb@xxxxxxx On 7/26/19 12:02 AM, Nithya Balachandran wrote: On Fri, 26 Jul 2019 at 01:56, Matthew Benstead <matthewb@xxxxxxx> wrote:Hi Nithya, Hmm... I don't remember if I did, but based on what I'm seeing it sounds like I probably didn't run rebalance or fix-layout. It looks like folders that haven't had any new files created have a dht of 0, while other folders have non-zero values. [root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/ | grep dht [root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/home | grep dht trusted.glusterfs.dht=0x00000000000000000000000000000000 [root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/home/matthewb | grep dht trusted.glusterfs.dht=0x00000001000000004924921a6db6dbc7 If I just run the fix-layout command will it re-create all of the dht values or just the missing ones?A fix-layout will recalculate the layouts entirely so files all the values will change. No files will be moved. A rebalance will recalculate the layouts like the fix-layout but will also move files to their new locations based on the new layout ranges. This could take a lot of time depending on the number of files/directories on the volume. If you do this, I would recommend that you turn off lookup-optimize until the rebalance is over.Since the brick is already fairly size balanced could I get away with running fix-layout but not rebalance? Or would the new dht layout mean slower accesses since the files may be expected on different bricks?The first access for a file will be slower. The next one will be faster as the location will be cached in the client's in-memory structures. You may not need to run either a fix-layout or a rebalance if new file creations will be in directories created after the add-brick. Gluster will automatically include all 7 bricks for those directories. Regards, NithyaThanks, -Matthew -- Matthew Benstead System Administrator Pacific Climate Impacts Consortium University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb@xxxxxxx On 7/24/19 9:30 PM, Nithya Balachandran wrote: On Wed, 24 Jul 2019 at 22:12, Matthew Benstead <matthewb@xxxxxxx> wrote:So looking more closely at the trusted.glusterfs.dht attributes from the bricks it looks like they cover the entire range... and there is no range left for gluster07. The first 6 bricks range from 0x00000000 to 0xffffffff - so... is there a way to re-calculate what the dht values should be? Each of the bricks should have a gap Gluster05 00000000 -> 2aaaaaa9 Gluster06 2aaaaaaa -> 55555553 Gluster01 55555554 -> 7ffffffd Gluster02 7ffffffe -> aaaaaaa7 Gluster03 aaaaaaa8 -> d5555551 Gluster04 d5555552 -> ffffffff Gluster07 None If we split the range into 7 servers that would be a gap of about 0x24924924 for each server. Now in terms of the gluster07 brick, about 2 years ago the RAID array the brick was stored on became corrupted. I ran the remove-brick force command, then provisioned a new server, ran the add-brick command and then restored the missing files from backup by copying them back to the main gluster mount (not the brick).Did you run a rebalance after performing the add-brick? Without a rebalance/fix-layout , the layout for existing directories on the volume will not be updated to use the new brick as well. That the layout does not include the new brick in the root dir is in itself is not a problem. Do you create a lot of files directly in the root of the volume? If yes, you might want to run a rebalance. Otherwise, if you mostly create files in newly added directories, you can probably ignore this. You can check the layout for directories on the volume and see if they incorporate the brick7. I would expect a lookup on the root to have set an xattr on the brick with an empty layout range . The fact that the xattr does not exist at all on the brick is what I am looking into.It looks like prior to that event this was the layout - which would make sense given the equal size of the 7 bricks: gluster02.pcic.uvic.ca | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x000000010000000048bfff206d1ffe5f gluster05.pcic.uvic.ca | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x0000000100000000b5dffce0da3ffc1f gluster04.pcic.uvic.ca | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x0000000100000000917ffda0b5dffcdf gluster03.pcic.uvic.ca | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x00000001000000006d1ffe60917ffd9f gluster01.pcic.uvic.ca | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x0000000100000000245fffe048bfff1f gluster07.pcic.uvic.ca | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x000000010000000000000000245fffdf gluster06.pcic.uvic.ca | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x0000000100000000da3ffc20ffffffff Which yields the following: 00000000 -> 245fffdf Gluster07 245fffe0 -> 48bfff1f Gluster01 48bfff20 -> 6d1ffe5f Gluster02 6d1ffe60 -> 917ffd9f Gluster03 917ffda0 -> b5dffcdf Gluster04 b5dffce0 -> da3ffc1f Gluster05 da3ffc20 -> ffffffff Gluster06 Is there some way to get back to this? Thanks, -Matthew -- Matthew Benstead System Administrator Pacific Climate Impacts Consortium University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb@xxxxxxx On 7/18/19 7:20 AM, Matthew Benstead wrote: Hi Nithya, No - it was added about a year and a half ago. I have tried re-mounting the volume on the server, but it didn't add the attr: [root@gluster07 ~]# umount /storage/ [root@gluster07 ~]# cat /etc/fstab | grep "/storage" 10.0.231.56:/storage /storage glusterfs defaults,log-level=WARNING,backupvolfile-server=10.0.231.51 0 0 [root@gluster07 ~]# mount /storage/ [root@gluster07 ~]# df -h /storage/ Filesystem Size Used Avail Use% Mounted on 10.0.231.56:/storage 255T 194T 62T 77% /storage [root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/ # file: /mnt/raid6-storage/storage/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.6f95525a-94d7-4174-bac4-e1a18fe010a2.xtime=0x5d307baa00023ec0 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size.2=0x00001b71d5279e000000000000763e32000000000005cd53 trusted.glusterfs.volume-id=0x6f95525a94d74174bac4e1a18fe010a2 Thanks, -Matthew On 7/17/19 10:04 PM, Nithya Balachandran wrote: Hi Matthew, Was this node/brick added to the volume recently? If yes, try mounting the volume on a fresh mount point - that should create the xattr on this as well. Regards, Nithya On Wed, 17 Jul 2019 at 21:01, Matthew Benstead <matthewb@xxxxxxx> wrote:Hello, I've just noticed one brick in my 7 node distribute volume is missing the trusted.glusterfs.dht xattr...? How can I fix this? I'm running glusterfs-5.3-2.el7.x86_64 on CentOS 7. All of the other nodes are fine, but gluster07 from the list below does not have the attribute. $ ansible -i hosts gluster-servers[0:6] ... -m shell -a "getfattr -m . --absolute-names -n trusted.glusterfs.dht -e hex /mnt/raid6-storage/storage" ... gluster05 | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x0000000100000000000000002aaaaaa9 gluster03 | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x0000000100000000aaaaaaa8d5555551 gluster04 | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x0000000100000000d5555552ffffffff gluster06 | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x00000001000000002aaaaaaa55555553 gluster02 | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x00000001000000007ffffffeaaaaaaa7 gluster07 | FAILED | rc=1 >> /mnt/raid6-storage/storage: trusted.glusterfs.dht: No such attributenon-zero return code gluster01 | SUCCESS | rc=0 >> # file: /mnt/raid6-storage/storage trusted.glusterfs.dht=0x0000000100000000555555547ffffffd Here are all of the attr's from the brick: [root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/ # file: /mnt/raid6-storage/storage/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.6f95525a-94d7-4174-bac4-e1a18fe010a2.xtime=0x5d2dee800001fdf9 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size.2=0x00001b69498a1400000000000076332e000000000005cd03 trusted.glusterfs.volume-id=0x6f95525a94d74174bac4e1a18fe010a2 And here is the volume information: [root@gluster07 ~]# gluster volume info storage Volume Name: storage Type: Distribute Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2 Status: Started Snapshot Count: 0 Number of Bricks: 7 Transport-type: tcp Bricks: Brick1: 10.0.231.50:/mnt/raid6-storage/storage Brick2: 10.0.231.51:/mnt/raid6-storage/storage Brick3: 10.0.231.52:/mnt/raid6-storage/storage Brick4: 10.0.231.53:/mnt/raid6-storage/storage Brick5: 10.0.231.54:/mnt/raid6-storage/storage Brick6: 10.0.231.55:/mnt/raid6-storage/storage Brick7: 10.0.231.56:/mnt/raid6-storage/storage Options Reconfigured: changelog.changelog: on features.quota-deem-statfs: on features.read-only: off features.inode-quota: on features.quota: on performance.readdir-ahead: on nfs.disable: on geo-replication.indexing: on geo-replication.ignore-pid-check: on transport.address-family: inet Thanks, -Matthew _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users