Re: Brick missing trusted.glusterfs.dht xattr

Matthew Benstead <matthewb@xxxxxxx> · Fri, 9 Aug 2019 14:33:04 -0700

    Hi Sunny, 

    Where would I find the changes-<brick-path>.log files? Is
    there anything else to help diagnose this? 

    Thanks,

     -Matthew

        --

          Matthew Benstead

          System Administrator

            Pacific Climate
              Impacts Consortium

            University of Victoria, UH1

            PO Box 1800, STN CSC

            Victoria, BC, V8W 2Y2

            Phone: +1-250-721-8432

            Email: matthewb@xxxxxxx

    On 7/29/19 9:46 AM, Matthew Benstead
      wrote:

      Hi Sunny,

Yes, I have attached the gsyncd.log file. I couldn't find any
changes-<brick-path>.log files...

Trying to start replication goes faulty right away:

[root@gluster01 ~]# rpm -q glusterfs
glusterfs-5.6-1.el7.x86_64
[root@gluster01 ~]# uname -r
3.10.0-957.21.3.el7.x86_64
[root@gluster01 ~]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

[root@gluster01 ~]# gluster volume geo-replication storage
root@10.0.231.81::pcic-backup start
Starting geo-replication session between storage &
10.0.231.81::pcic-backup has been successful
[root@gluster01 ~]# gluster volume geo-replication storage
root@10.0.231.81::pcic-backup status

MASTER NODE    MASTER VOL    MASTER BRICK                  SLAVE USER   
SLAVE                       SLAVE NODE    STATUS    CRAWL STATUS   
LAST_SYNCED         
-------------------------------------------------------------------------------------------------------------------------------------------------------
10.0.231.50    storage       /mnt/raid6-storage/storage    root         
10.0.231.81::pcic-backup    N/A           Faulty    N/A            
N/A                 
10.0.231.52    storage       /mnt/raid6-storage/storage    root         
10.0.231.81::pcic-backup    N/A           Faulty    N/A            
N/A                 
10.0.231.54    storage       /mnt/raid6-storage/storage    root         
10.0.231.81::pcic-backup    N/A           Faulty    N/A            
N/A                 
10.0.231.51    storage       /mnt/raid6-storage/storage    root         
10.0.231.81::pcic-backup    N/A           Faulty    N/A            
N/A                 
10.0.231.53    storage       /mnt/raid6-storage/storage    root         
10.0.231.81::pcic-backup    N/A           Faulty    N/A            
N/A                 
10.0.231.55    storage       /mnt/raid6-storage/storage    root         
10.0.231.81::pcic-backup    N/A           Faulty    N/A            
N/A                 
10.0.231.56    storage       /mnt/raid6-storage/storage    root         
10.0.231.81::pcic-backup    N/A           Faulty    N/A            
N/A                 
[root@gluster01 ~]# gluster volume geo-replication storage
root@10.0.231.81::pcic-backup stop
Stopping geo-replication session between storage &
10.0.231.81::pcic-backup has been successful

This is the primary cluster:

[root@gluster01 ~]# gluster volume info storage

Volume Name: storage
Type: Distribute
Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2
Status: Started
Snapshot Count: 0
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: 10.0.231.50:/mnt/raid6-storage/storage
Brick2: 10.0.231.51:/mnt/raid6-storage/storage
Brick3: 10.0.231.52:/mnt/raid6-storage/storage
Brick4: 10.0.231.53:/mnt/raid6-storage/storage
Brick5: 10.0.231.54:/mnt/raid6-storage/storage
Brick6: 10.0.231.55:/mnt/raid6-storage/storage
Brick7: 10.0.231.56:/mnt/raid6-storage/storage
Options Reconfigured:
features.read-only: off
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
nfs.disable: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
transport.address-family: inet
features.quota-deem-statfs: on
changelog.changelog: on
diagnostics.client-log-level: INFO

And this is the cluster I'm trying to replicate to:

[root@pcic-backup01 ~]# gluster volume info pcic-backup

Volume Name: pcic-backup
Type: Distribute
Volume ID: 2890bcde-a023-4feb-a0e5-e8ef8f337d4c
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.0.231.81:/pcic-backup01-zpool/brick
Brick2: 10.0.231.82:/pcic-backup02-zpool/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet

Thanks,
 -Matthew

On 7/28/19 10:56 PM, Sunny Kumar wrote:

        HI Matthew,

Can you share geo-rep logs and one more log file
(changes-<brick-path>.log) it will help to pinpoint actual reason
behind failure.

/sunny

On Mon, Jul 29, 2019 at 9:13 AM Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:

On Sat, 27 Jul 2019 at 02:31, Matthew Benstead <matthewb@xxxxxxx> wrote:

            Ok thank-you for explaining everything - that makes sense.

Currently the brick file systems are pretty evenly distributed so I probably won't run the fix-layout right now.

Would this state have any impact on geo-replication? I'm trying to geo-replicate this volume, but am getting a weird error: "Changelog register failed error=[Errno 21] Is a directory"

          It should not. Sunny, can you comment on this?

Regards,
Nithya

            I assume this is related to something else, but I wasn't sure.

Thanks,
 -Matthew

--
Matthew Benstead
System Administrator
Pacific Climate Impacts Consortium
University of Victoria, UH1
PO Box 1800, STN CSC
Victoria, BC, V8W 2Y2
Phone: +1-250-721-8432
Email: matthewb@xxxxxxx

On 7/26/19 12:02 AM, Nithya Balachandran wrote:

On Fri, 26 Jul 2019 at 01:56, Matthew Benstead <matthewb@xxxxxxx> wrote:

              Hi Nithya,

Hmm... I don't remember if I did, but based on what I'm seeing it sounds like I probably didn't run rebalance or fix-layout.

It looks like folders that haven't had any new files created have a dht of 0, while other folders have non-zero values.

[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/ | grep dht
[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/home | grep dht
trusted.glusterfs.dht=0x00000000000000000000000000000000
[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/home/matthewb | grep dht
trusted.glusterfs.dht=0x00000001000000004924921a6db6dbc7

If I just run the fix-layout command will it re-create all of the dht values or just the missing ones?

            A fix-layout will recalculate the layouts entirely so files all the values will change. No files will be moved.
A rebalance will recalculate the layouts like the fix-layout but will also move files to their new locations based on the new layout ranges. This could take a lot of time depending on the number of files/directories on the volume. If you do this, I would recommend that you turn off lookup-optimize until the rebalance is over.

              Since the brick is already fairly size balanced could I get away with running fix-layout but not rebalance? Or would the new dht layout mean slower accesses since the files may be expected on different bricks?

            The first access for a file will be slower. The next one will be faster as the location will be cached in the client's in-memory structures.
You may not need to run either a fix-layout or a rebalance if new file creations will be in directories created after the add-brick. Gluster will automatically include all 7 bricks for those directories.

Regards,
Nithya

              Thanks,
 -Matthew

--
Matthew Benstead
System Administrator
Pacific Climate Impacts Consortium
University of Victoria, UH1
PO Box 1800, STN CSC
Victoria, BC, V8W 2Y2
Phone: +1-250-721-8432
Email: matthewb@xxxxxxx

On 7/24/19 9:30 PM, Nithya Balachandran wrote:

On Wed, 24 Jul 2019 at 22:12, Matthew Benstead <matthewb@xxxxxxx> wrote:

                So looking more closely at the trusted.glusterfs.dht attributes from the bricks it looks like they cover the entire range... and there is no range left for gluster07.

The first 6 bricks range from 0x00000000 to 0xffffffff - so... is there a way to re-calculate what the dht values should be? Each of the bricks should have a gap

Gluster05 00000000 -> 2aaaaaa9
Gluster06 2aaaaaaa -> 55555553
Gluster01 55555554 -> 7ffffffd
Gluster02 7ffffffe -> aaaaaaa7
Gluster03 aaaaaaa8 -> d5555551
Gluster04 d5555552 -> ffffffff
Gluster07 None

If we split the range into 7 servers that would be a gap of about 0x24924924 for each server.

Now in terms of the gluster07 brick, about 2 years ago the RAID array the brick was stored on became corrupted. I ran the remove-brick force command, then provisioned a new server, ran the add-brick command and then restored the missing files from backup by copying them back to the main gluster mount (not the brick).

              Did you run a rebalance after performing the add-brick? Without a rebalance/fix-layout , the layout for existing directories on the volume will not  be updated to use the new brick as well.

That the layout does not include the new brick in the root dir is in itself is not a problem. Do you create a lot of files directly in the root of the volume? If yes, you might want to run a rebalance. Otherwise, if you mostly create files in newly added directories, you can probably ignore this. You can check the layout for directories on the volume and see if they incorporate the brick7.

I would expect a lookup on the root to have set an xattr on the brick with an empty layout range . The fact that the xattr does not exist at all on the brick is what I am looking into.

                It looks like prior to that event this was the layout - which would make sense given the equal size of the 7 bricks:

gluster02.pcic.uvic.ca | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x000000010000000048bfff206d1ffe5f

gluster05.pcic.uvic.ca | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x0000000100000000b5dffce0da3ffc1f

gluster04.pcic.uvic.ca | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x0000000100000000917ffda0b5dffcdf

gluster03.pcic.uvic.ca | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x00000001000000006d1ffe60917ffd9f

gluster01.pcic.uvic.ca | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x0000000100000000245fffe048bfff1f

gluster07.pcic.uvic.ca | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x000000010000000000000000245fffdf

gluster06.pcic.uvic.ca | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x0000000100000000da3ffc20ffffffff

Which yields the following:

00000000 -> 245fffdf    Gluster07
245fffe0 -> 48bfff1f    Gluster01
48bfff20 -> 6d1ffe5f    Gluster02
6d1ffe60 -> 917ffd9f    Gluster03
917ffda0 -> b5dffcdf    Gluster04
b5dffce0 -> da3ffc1f    Gluster05
da3ffc20 -> ffffffff    Gluster06

Is there some way to get back to this?

Thanks,
 -Matthew

--
Matthew Benstead
System Administrator
Pacific Climate Impacts Consortium
University of Victoria, UH1
PO Box 1800, STN CSC
Victoria, BC, V8W 2Y2
Phone: +1-250-721-8432
Email: matthewb@xxxxxxx

On 7/18/19 7:20 AM, Matthew Benstead wrote:

Hi Nithya,

No - it was added about a year and a half ago. I have tried re-mounting the volume on the server, but it didn't add the attr:

[root@gluster07 ~]# umount /storage/
[root@gluster07 ~]# cat /etc/fstab | grep "/storage"
10.0.231.56:/storage /storage glusterfs defaults,log-level=WARNING,backupvolfile-server=10.0.231.51 0 0
[root@gluster07 ~]# mount /storage/
[root@gluster07 ~]# df -h /storage/
Filesystem            Size  Used Avail Use% Mounted on
10.0.231.56:/storage  255T  194T   62T  77% /storage
[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/
# file: /mnt/raid6-storage/storage/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.6f95525a-94d7-4174-bac4-e1a18fe010a2.xtime=0x5d307baa00023ec0
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.2=0x00001b71d5279e000000000000763e32000000000005cd53
trusted.glusterfs.volume-id=0x6f95525a94d74174bac4e1a18fe010a2

Thanks,
 -Matthew

On 7/17/19 10:04 PM, Nithya Balachandran wrote:

Hi Matthew,

Was this node/brick added to the volume recently? If yes, try mounting the volume on a fresh mount point - that should create the xattr on this as well.

Regards,
Nithya

On Wed, 17 Jul 2019 at 21:01, Matthew Benstead <matthewb@xxxxxxx> wrote:

                  Hello,

I've just noticed one brick in my 7 node distribute volume is missing
the trusted.glusterfs.dht xattr...? How can I fix this?

I'm running glusterfs-5.3-2.el7.x86_64 on CentOS 7.

All of the other nodes are fine, but gluster07 from the list below does
not have the attribute.

$ ansible -i hosts gluster-servers[0:6] ... -m shell -a "getfattr -m .
--absolute-names -n trusted.glusterfs.dht -e hex
/mnt/raid6-storage/storage"
...
gluster05 | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x0000000100000000000000002aaaaaa9

gluster03 | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x0000000100000000aaaaaaa8d5555551

gluster04 | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x0000000100000000d5555552ffffffff

gluster06 | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x00000001000000002aaaaaaa55555553

gluster02 | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x00000001000000007ffffffeaaaaaaa7

gluster07 | FAILED | rc=1 >>
/mnt/raid6-storage/storage: trusted.glusterfs.dht: No such
attributenon-zero return code

gluster01 | SUCCESS | rc=0 >>
# file: /mnt/raid6-storage/storage
trusted.glusterfs.dht=0x0000000100000000555555547ffffffd

Here are all of the attr's from the brick:

[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex
/mnt/raid6-storage/storage/
# file: /mnt/raid6-storage/storage/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.6f95525a-94d7-4174-bac4-e1a18fe010a2.xtime=0x5d2dee800001fdf9
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.2=0x00001b69498a1400000000000076332e000000000005cd03
trusted.glusterfs.volume-id=0x6f95525a94d74174bac4e1a18fe010a2

And here is the volume information:

[root@gluster07 ~]# gluster volume info storage

Volume Name: storage
Type: Distribute
Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2
Status: Started
Snapshot Count: 0
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: 10.0.231.50:/mnt/raid6-storage/storage
Brick2: 10.0.231.51:/mnt/raid6-storage/storage
Brick3: 10.0.231.52:/mnt/raid6-storage/storage
Brick4: 10.0.231.53:/mnt/raid6-storage/storage
Brick5: 10.0.231.54:/mnt/raid6-storage/storage
Brick6: 10.0.231.55:/mnt/raid6-storage/storage
Brick7: 10.0.231.56:/mnt/raid6-storage/storage
Options Reconfigured:
changelog.changelog: on
features.quota-deem-statfs: on
features.read-only: off
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
nfs.disable: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
transport.address-family: inet

Thanks,
 -Matthew
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users