Re: Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

Frank Ruehlemann <ruehlemann@xxxxxxxxxxxxxxxxxxx> · Mon, 23 Apr 2018 16:06:05 +0200

Hi,

here it is.

# gluster volume info $myvolume

Volume Name: $myvolume
Type: Distribute
Volume ID: 0d210c70-e44f-46f1-862c-ef260514c9f1
Status: Started
Snapshot Count: 0
Number of Bricks: 23
Transport-type: tcp
Bricks:
Brick1: gluster02:/srv/glusterfs/bricks/DATA201/data
Brick2: gluster02:/srv/glusterfs/bricks/DATA202/data
Brick3: gluster02:/srv/glusterfs/bricks/DATA203/data
Brick4: gluster02:/srv/glusterfs/bricks/DATA204/data
Brick5: gluster02:/srv/glusterfs/bricks/DATA205/data
Brick6: gluster02:/srv/glusterfs/bricks/DATA206/data
Brick7: gluster02:/srv/glusterfs/bricks/DATA207/data
Brick8: gluster02:/srv/glusterfs/bricks/DATA208/data
Brick9: gluster01:/srv/glusterfs/bricks/DATA110/data
Brick10: gluster01:/srv/glusterfs/bricks/DATA111/data
Brick11: gluster01:/srv/glusterfs/bricks/DATA112/data
Brick12: gluster01:/srv/glusterfs/bricks/DATA113/data
Brick13: gluster01:/srv/glusterfs/bricks/DATA114/data
Brick14: gluster02:/srv/glusterfs/bricks/DATA209/data
Brick15: gluster01:/srv/glusterfs/bricks/DATA101/data
Brick16: gluster01:/srv/glusterfs/bricks/DATA102/data
Brick17: gluster01:/srv/glusterfs/bricks/DATA103/data
Brick18: gluster01:/srv/glusterfs/bricks/DATA104/data
Brick19: gluster01:/srv/glusterfs/bricks/DATA105/data
Brick20: gluster01:/srv/glusterfs/bricks/DATA106/data
Brick21: gluster01:/srv/glusterfs/bricks/DATA107/data
Brick22: gluster01:/srv/glusterfs/bricks/DATA108/data
Brick23: gluster01:/srv/glusterfs/bricks/DATA109/data
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
auth.allow: $myipspace
performance.readdir-ahead: on
diagnostics.brick-log-level: WARNING
nfs.disable: on
transport.address-family: inet
nfs.addr-namelookup: off
diagnostics.brick-sys-log-level: WARNING

Well at least one thing got fixed by this reboot: "df -h" returns a
realistic size of the volume etc. This wasn't the case after our update
to 3.12.7.

Best Regards,

-- 
Frank Rühlemann
   IT-Systemtechnik

UNIVERSITÄT ZU LÜBECK
    IT-Service-Center

    Ratzeburger Allee 160
    23562 Lübeck
    Tel +49 451 3101 2034
    Fax +49 451 3101 2004
    ruehlemann@xxxxxxxxxxxxxxxxxxx
    www.itsc.uni-luebeck.de

Am Montag, den 23.04.2018, 19:12 +0530 schrieb Nithya Balachandran:
> Hi,
> 
> What is the output of 'gluster volume info' for this volume?
> 
> 
> Regards,
> Nithya
> 
> On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann@xxxxxxxxxxxxxxxxxxx>
> wrote:
> 
> > Hi,
> >
> > after 2 years running GlusterFS without bigger problems we're facing
> > some strange errors lately.
> >
> > After updating to 3.12.7 some user reported at least 4 broken
> > directories with some invisible files. The files are at the bricks and
> > don't start with a dot, but aren't visible in "ls". Clients still can
> > interact with them by using the explicit path.
> > More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071
> >
> > And since this update gluster reported for the rebalance of >16900 PB
> > (Petabyte!) of data for one of our 2 server, when using „gluster volume
> > rebalance $myvolume status“. The time looks right, but the size of
> > transfered files is absurd. The rebalance was with 3.12.6 in March 2018.
> > The last rebalance log file listed no errors and a realistic size at the
> > end.
> >
> > We started a new rebalance today during a downtime of our corresponding
> > compute cluster, since these errors started to spread and this might
> > help. The output of „gluster volume rebalance $myvolume status“ doesn't
> > list any errors so far and the numbers look like realistic values.
> > But we're seeing some strange errors (every few minutes) reports in the
> > journald:
> > „[2018-04-23 12:31:24.942377] E [MSGID: 113001]
> > [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix:
> > setxattr failed
> > on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/
> > e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop:
> > key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1
> > [No such file or directory]“
> > The rebalance log file lists no errors.
> >
> > Has anybody seen similar error messages during a rebalance?
> >
> > And we see some files dublicated. There are two copies on different
> > bricks (we're running a distributed volume).
> > One copy looks like this:
> > $ ls -lah
> > -rwxr--r--  2 $user $group  293 May 11  2017 config
> >
> > The other one looks rather strange:
> > $ ls -lah
> > ---------T  2 root    $group    0 May 11  2017 config
> >
> > Has anybody seen similar broken files?
> >
> > We're using gluster 3.12 from the gluster.org-repositories on a standard
> > Debian 9 with XFS formatted bricks.
> >
> > Hopefully somebody might have an answer how to fix this.
> >
> > At least somebody in the future might find this, since we didn't found
> > anything while searching after these errors. If you're from the future:
> > Good luck! (^_^)
> >
> > So far,
> >
> > --
> > Frank Rühlemann
> >    IT-Systemtechnik
> >
> > UNIVERSITÄT ZU LÜBECK
> >     IT-Service-Center
> >
> >     Ratzeburger Allee 160
> >     23562 Lübeck
> >     Tel +49 451 3101 2034
> >     Fax +49 451 3101 2004
> >     ruehlemann@xxxxxxxxxxxxxxxxxxx
> >     www.itsc.uni-luebeck.de
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users