Re: Weird glusterfs behaviour after add-bricks and fix-layout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings Thomas,

I'll try to assist in determining RC and resolving the issue.
The following will help me assisting you:
  1. Please create a an issue on GitHub with all the relevant details, it'll be easier to track
  2. Please provide all client-side, brick-side and fix-layout logs
With that information I could begin an initial assessment of the situation.

Thank you,

On Sun, Nov 8, 2020 at 5:35 PM Thomas Bätzler <t.baetzler@xxxxxxxxxx> wrote:
Hi,

the other day we decided to expand our gluster storage by adding two
bricks, going from a 3x2 to a 4x2 distributed-replicated setup. In order
to get to this point we had done rolling upgrades from 3.something to
5.13 to 7.8, all without issues. We ran into a spot of trouble during
the fix-layout when both of the new nodes crashed in the space of two
days. We rebooted them and the fix-layout process seemed to cope by
starting itself again on these nodes.

Now weird things are happening. We noticed that we can't access some
files diretly anymore. Doing a stat on them returns a file not found.
However, if we list the directory containing the files, the file is
shown as present and subsequently it can be accessed on the client that
ran the list, but not on other clients! Also, if we umount and remount,
the file is inaccessible again.

Does anybody have any ideas what's going on here? Is there any way to
fix the volume without taking it offline for days? We have about 60T of
data online and we need that data to be consistent and available.

OS is Debian 10 with glusterfs-server 7.8-3.

Volume configuration:

Volume Name: hotcache
Type: Distributed-Replicate
Volume ID: 4c006efa-6fd6-4809-93b0-28dd33fee2d2
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: hotcache1:/data/glusterfs/drive1/hotcache
Brick2: hotcache2:/data/glusterfs/drive1/hotcache
Brick3: hotcache3:/data/glusterfs/drive1/hotcache
Brick4: hotcache4:/data/glusterfs/drive1/hotcache
Brick5: hotcache5:/data/glusterfs/drive1/hotcache
Brick6: hotcache6:/data/glusterfs/drive1/hotcache
Brick7: hotcache7:/data/glusterfs/drive1/hotcache
Brick8: hotcache8:/data/glusterfs/drive1/hotcache
Options Reconfigured:
performance.readdir-ahead: off
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
server.statedump-path: /var/tmp
diagnostics.brick-sys-log-level: ERROR
storage.linux-aio: on
performance.read-ahead: off
performance.write-behind-window-size: 4MB
performance.cache-max-file-size: 200kb
nfs.disable: on
performance.cache-refresh-timeout: 1
performance.io-cache: on
performance.stat-prefetch: off
performance.quick-read: on
performance.io-thread-count: 16
auth.allow: *
cluster.readdir-optimize: on
performance.flush-behind: off
transport.address-family: inet
cluster.self-heal-daemon: enable

TIA!

Best regards,
Thomas Bätzler
--
BRINGE Informationstechnik GmbH
Zur Seeplatte 12
D-76228 Karlsruhe
Germany

Fon: +49 721 94246-0
Fon: +49 171 5438457
Fax: +49 721 94246-66
Web: http://www.bringe.de/

Geschäftsführer: Dipl.-Ing. (FH) Martin Bringe
Ust.Id: DE812936645, HRB 108943 Mannheim                       

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Barak Sason Rofman

Gluster Storage Development

Red Hat Israel

34 Jerusalem rd. Ra'anana, 43501

bsasonro@redhat.com    T: +972-9-7692304
M: +972-52-4326355

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux