I read on another thread about checking the getfattr output for each brick, but it tailed off before any explanation of what to do with this information We have 8 bricks in the volume. Config is: g1:~ # gluster volume info glustervol1 Volume Name: glustervol1 Type: Distributed-Replicate Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: g1:/mnt/glus1 Brick2: g2:/mnt/glus1 Brick3: g3:/mnt/glus1 Brick4: g4:/mnt/glus1 Brick5: g1:/mnt/glus2 Brick6: g2:/mnt/glus2 Brick7: g3:/mnt/glus2 Brick8: g4:/mnt/glus2 Options Reconfigured: performance.write-behind-window-size: 100mb performance.cache-size: 512mb performance.stat-prefetch: on and the getfattr outputs are: g1:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1 getfattr: Removing leading '/' from absolute path names # file: mnt/glus1 trusted.afr.glustervol1-client-0=0x000000000000000000000000 trusted.afr.glustervol1-client-1=0x000000000000000000000000 g1:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2 getfattr: Removing leading '/' from absolute path names # file: mnt/glus2 trusted.afr.glustervol1-client-4=0x000000000000000000000000 trusted.afr.glustervol1-client-5=0x000000000000000000000000 g2:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1 getfattr: Removing leading '/' from absolute path names # file: mnt/glus1 trusted.afr.glustervol1-client-0=0x000000000000000000000000 trusted.afr.glustervol1-client-1=0x000000000000000000000000 g2:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2 getfattr: Removing leading '/' from absolute path names # file: mnt/glus2 trusted.afr.glustervol1-client-4=0x000000000000000000000000 trusted.afr.glustervol1-client-5=0x000000000000000000000000 g3:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1 getfattr: Removing leading '/' from absolute path names # file: mnt/glus1 trusted.afr.glustervol1-client-2=0x000000000000000000000000 trusted.afr.glustervol1-client-3=0x000000000000000100000000 g3:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2 getfattr: Removing leading '/' from absolute path names # file: mnt/glus2 trusted.afr.glustervol1-client-6=0x000000000000000000000000 trusted.afr.glustervol1-client-7=0x000000000000000000000000 g4:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1 getfattr: Removing leading '/' from absolute path names # file: mnt/glus1 trusted.afr.glustervol1-client-2=0x000000000000000100000000 trusted.afr.glustervol1-client-3=0x000000000000000000000000 g4:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2 getfattr: Removing leading '/' from absolute path names # file: mnt/glus2 trusted.afr.glustervol1-client-6=0x000000000000000000000000 trusted.afr.glustervol1-client-7=0x000000000000000000000000 Hope someone can help. Things still seem to be working, but slowed down. Cheers David On 26 January 2011 17:07, David Lloyd <david.lloyd at v-consultants.co.uk>wrote: > We started getting the same problem at almost exactly the same time. > > get one of these messages every time I access the root of the mounted > volume (and nowhere else, I think). > This is also 3.1.1 > > I'm just starting to look in to it, I'll let you know if I get anywhere. > > David > > On 26 January 2011 16:38, Burnash, James <jburnash at knight.com> wrote: > >> These errors are appearing in the file /var/log/glusterfs/<mountpoint>.log >> >> [2011-01-26 11:02:10.342349] I [afr-common.c:672:afr_lookup_done] >> pfs-ro1-replicate-5: split brain detected during lookup of /. >> [2011-01-26 11:02:10.342366] I [afr-common.c:716:afr_lookup_done] >> pfs-ro1-replicate-5: background meta-data data self-heal triggered. path: / >> [2011-01-26 11:02:10.342502] E >> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] pfs-ro1-replicate-2: >> Unable to self-heal permissions/ownership of '/' (possible split-brain). >> Please fix the file on all backend volumes >> >> Apparently the issue is the root of the storage pool, which in my case on >> the backend storage servers is this path: >> >> /export/read-only - permissions are: drwxr-xr-x 12 root root >> 4096 Dec 28 12:09 /export/read-only/ >> >> Installation is GlusterFS 3.1.1 on servers and clients, servers running >> CentOS 5.5, clients running CentOS 5.2. >> >> The volume info header is below: >> >> Volume Name: pfs-ro1 >> Type: Distributed-Replicate >> Status: Started >> Number of Bricks: 10 x 2 = 20 >> Transport-type: tcp >> >> Any ideas? I don't see a permission issue on the directory or it's subs >> themselves. >> >> James Burnash, Unix Engineering >> >> >>