Re: Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For the arb_0 I seeonly 8 clients , while there should be 12 clients:

Brick : 192.168.0.40:/var/bricks/0/brick
Clients connected : 12

Brick : 192.168.0.41:/var/bricks/0/brick
Clients connected : 12

Brick : 192.168.0.80:/var/bricks/arb_0/brick
Clients connected : 8

Can you try to reconnect them. The most simple way is to kill the arbiter process and 'gluster volume start force' , but always verify that you have both data bricks up and running.


Yet, this doesn't explain why the heal daemon is not able to replicate properly.


Best Regards,
Strahil Nikolov
Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty.


node0: 192.168.0.40

node1: 192.168.0.41

node3: 192.168.0.80


volume info:

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (2 + 1) = 18
Transport-type: tcp
Bricks:
Brick1: 192.168.0.40:/var/bricks/0/brick
Brick2: 192.168.0.41:/var/bricks/0/brick
Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter)
Brick4: 192.168.0.40:/var/bricks/2/brick
Brick5: 192.168.0.80:/var/bricks/2/brick
Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter)
Brick7: 192.168.0.40:/var/bricks/1/brick
Brick8: 192.168.0.41:/var/bricks/1/brick
Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter)
Brick10: 192.168.0.40:/var/bricks/3/brick
Brick11: 192.168.0.80:/var/bricks/3/brick
Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter)
Brick13: 192.168.0.41:/var/bricks/3/brick
Brick14: 192.168.0.80:/var/bricks/4/brick
Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter)
Brick16: 192.168.0.41:/var/bricks/2/brick
Brick17: 192.168.0.80:/var/bricks/5/brick
Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter)
Options Reconfigured:
cluster.min-free-inodes: 6%
cluster.min-free-disk: 2%
performance.md-cache-timeout: 600
cluster.rebal-throttle: lazy
features.scrub-freq: monthly
features.scrub-throttle: lazy
features.scrub: Inactive
features.bitrot: off
cluster.server-quorum-type: none
performance.cache-refresh-timeout: 10
performance.cache-max-file-size: 64MB
performance.cache-size: 781901824
auth.allow: /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136)
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
cluster.quorum-type: auto
features.cache-invalidation: on
nfs.disable: on
transport.address-family: inet
cluster.self-heal-daemon: on
cluster.server-quorum-ratio: 51%

volume status:

Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066
Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082
Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186
Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075
Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325
Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903
Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084
Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104
Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314
Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692
Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269
Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942
Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058
Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433
Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115
Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602
Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522
Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159
Self-heal Daemon on localhost N/A N/A Y 26199
Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635
Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

volume heal info summary:

Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs
Status: Connected
Total Number of entries: 1006
Number of entries in heal pending: 1006
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.41:/var/bricks/0/brick
Status: Connected
Total Number of entries: 1006
Number of entries in heal pending: 1006
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.80:/var/bricks/arb_0/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.40:/var/bricks/2/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.80:/var/bricks/2/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.41:/var/bricks/arb_1/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.40:/var/bricks/1/brick
Status: Connected
Total Number of entries: 1006
Number of entries in heal pending: 1006
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.41:/var/bricks/1/brick
Status: Connected
Total Number of entries: 1006
Number of entries in heal pending: 1006
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.80:/var/bricks/arb_1/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.40:/var/bricks/3/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.80:/var/bricks/3/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.41:/var/bricks/arb_0/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.41:/var/bricks/3/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.80:/var/bricks/4/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.40:/var/bricks/arb_0/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.41:/var/bricks/2/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.80:/var/bricks/5/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.40:/var/bricks/arb_1/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

client-list:

Client connections for volume gv0
Name count
----- ------
fuse 5
gfapi.ganesha.nfsd 3
glustershd 3

total clients for volume gv0 : 11
-----------------------------------------------------------------

all clients: https://pro.hostit.de/nextcloud/index.php/s/tWdHox3aqb3qqbG

failing mnt.log https://pro.hostit.de/nextcloud/index.php/s/2E2NLnXNsTy7EQe


Thank you.

A.


"Strahil Nikolov" hunter86_bg@xxxxxxxxx – 31. Mai 2021 05:23
> Can you provide gluster volume info , gluster volume status and gluster volume heal info summary and most probably gluster volume status all clients/client-list

>
>
> Best Regards,
> Strahil Nikolov
>
> > On Sun, May 30, 2021 at 15:17, a.schwibbe@xxxxxxx
> > wrote:
> >
> > I am seeking help here after looking for solutions on the web for my distributed-replicated volume.
> >
> > My volume is operated since v3.10 and I upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I love it.
> >
> > Finally I wanted to extend my 6x2 distributed replicated volume with arbiters for better split-brain protection.
> >
> >
> > So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x (2 +1) and self-heal immideately started. Looking good.
> >
> >
> >
> > Version: 7.9
> >
> >
> > Number of Bricks: 6 x (2 + 1) = 18
> >
> >
> > cluster.max-op-version: 70200
> >
> >
> > Peers: 3 (node[0..2])
> >
> >
> > Layout
> >
> >
> > |node0 |node1 |node2
> >
> > |brick0 |brick0 |arbit0
> >
> >
> > |arbit1 |brick1 |brick1
> >
> >
> > ....
> >
> >
> >
> > I then recognized that arbiter volumes on node0 & node1 have been healed successfully.
> >
> > Unfortunately all arbiter volumes on node2 have not been healed!
> >
> > I realized that the main dir on my arb mount point has been added (mount point /var/brick/arb_0 now contains dir "brick") however this dir on _all_ other bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty arb-volumes does contain ".glusterfs", however it has only very few entries. Other than that "brick" is empty.
> >
> > At that point I changed brick dir owner with chown to 33:33 and hoped for self-heal to work. It did not.
> >
> > I hoped a rebalance fix-layout would fix things. It did not.
> >
> > I hoped for a glusterd restart on node2 (as this is happening to both arb volumes on this node exclusively) would help. It did not.
> >
> >
> > Active mount points via nfs-ganesha or fuse continue to work.
> >
> > Existing clients cause errors in the arb-brick logs on node2 for missing files or dirs, but clients seem not affected. r/w operations work.
> >
> >
> > New clients are not able to fuse mount the volume for "authentication error".
> >
> > heal statistics heal-count show several hundred files need healing, this count is rising. Watching df on the arb-brick mount point on node2 shows every now and then a few bytes written, but then removed immideately after that.
> >
> >
> > Any help/recommendation for you highly appreciated.
> >
> > Thank you!
> >
> >
> > A.
> >
> > ________
> >
> >
> >
> >
> >
> > Community Meeting Calendar:
> >
> >
> > Schedule -
> >
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> >
> > Gluster-users mailing list
> >
> > Gluster-users@xxxxxxxxxxx
> >
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux