I would avoid shrinking the volume. An oVirt user reported issues after volume shrinking.
Did you try to format the arbiter brick and 'replace-brick' ?
Best Regards,
Strahil Nikolov
I can't find anything suspicious in the brick logs other than authetication refused to clients trying to mount a dir that is not existing on the arb_n, because the self-heal isn't working.I tried to add another node and replace-brick a faulty arbiter, however this new arbiter sees the same error.Last idea is to completely remove first subvolume, then re-add as new hoping it will work.A.> Ok, will do.>>> working arbiter:>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick>> ls- lna /var/bricks/arb_0/brick >>> drw------- 262 0 0 8192 Mai 29 22:38 .glusterfs> + all data-brick dirs ...>>> affected arbiter:>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick> ls -lna /var/bricks/arb_0/brick >>> drw------- 7 0 0 99 Mai 30 16:23 .glusterfs> nothing else here>>> find /var/bricks/arb_0/brick -not -user 33 -print>> /var/bricks/arb_0/brick/.glusterfs> /var/bricks/arb_0/brick/.glusterfs/indices> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop> /var/bricks/arb_0/brick/.glusterfs/indices/dirty> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes> /var/bricks/arb_0/brick/.glusterfs/changelogs> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap> /var/bricks/arb_0/brick/.glusterfs/00> /var/bricks/arb_0/brick/.glusterfs/00/00> /var/bricks/arb_0/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001> /var/bricks/arb_0/brick/.glusterfs/landfill> /var/bricks/arb_0/brick/.glusterfs/unlink> /var/bricks/arb_0/brick/.glusterfs/health_check>> find /var/bricks/arb_0/brick -not -user 33 -print>> /var/bricks/arb_0/brick/.glusterfs> /var/bricks/arb_0/brick/.glusterfs/indices> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop> /var/bricks/arb_0/brick/.glusterfs/indices/dirty> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes> /var/bricks/arb_0/brick/.glusterfs/changelogs> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap> /var/bricks/arb_0/brick/.glusterfs/00> /var/bricks/arb_0/brick/.glusterfs/00/00> /var/bricks/arb_0/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001> /var/bricks/arb_0/brick/.glusterfs/landfill> /var/bricks/arb_0/brick/.glusterfs/unlink> /var/bricks/arb_0/brick/.glusterfs/health_check>> Output is identical to user:group 36 as all these have UID:GID 0:0, but these files have 0:0 also on the working arbiters.> And this is all files/dirs that exist on the affected arbs. Nothing more on it. There should be much more, but this seems to missing self heal.>> Thanks.>> A.>>> "Strahil Nikolov" hunter86_bg@xxxxxxxxx – 31. Mai 2021 13:11> > Hi,> >> > I think that the best way is to go through the logs on the affected arbiter brick (maybe even temporarily increase the log level).> >> > What is the output of:> >> > find /var/brick/arb_0/brick -not -user 36 -print> > find /var/brick/arb_0/brick -not group 36 -print> >> > Maybe there are some files/dirs that are with wrong ownership.> >> > Best Regards,> > Strahil Nikolov> >> > >> > > Thanks Strahil,> > >> > > unfortunately I cannot connect as the mount is denied as in mount.log provided.> > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When killing the arb pids on node2 new clients can mount the volume. When bringing them up again I experience the same problem.> > >> > > I wonder why the root dir on the arb bricks has wrong UID:GID.> > > I added regular data bricks before without any problems on node2.> > >> > > Also when executing "watch df"> > >> > > I see> > >> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0> > > ..> > >> > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0> > >> > > ..> > >> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0> > >> > > So heal daemon might try to do something, which isn't working. Thus I chowned UID:GID of ../arb_0/brick manually to match, but it did not work either.> > >> > > As I added all 6 arbs at once and 4 are working as expected I really don't get what's wrong with these...> > >> > > A.> > >> > > "Strahil Nikolov" hunter86_bg@xxxxxxxxx – 31. Mai 2021 11:12> > > > For the arb_0 I seeonly 8 clients , while there should be 12 clients:> > > > Brick : 192.168.0.40:/var/bricks/0/brick> > > > Clients connected : 12> > > >> > > > Brick : 192.168.0.41:/var/bricks/0/brick> > > > Clients connected : 12> > > >> > > > Brick : 192.168.0.80:/var/bricks/arb_0/brick> > > > Clients connected : 8> > > >> > > > Can you try to reconnect them. The most simple way is to kill the arbiter process and 'gluster volume start force' , but always verify that you have both data bricks up and running.> > > >> > > >> > > >> > > > Yet, this doesn't explain why the heal daemon is not able to replicate properly.> > > >> > > >> > > >> > > > Best Regards,> > > > Strahil Nikolov> > > > >> > > > > Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty.> > > > >> > > > > node0: 192.168.0.40> > > > >> > > > > node1: 192.168.0.41> > > > >> > > > > node3: 192.168.0.80> > > > >> > > > > volume info:> > > > >> > > > > Volume Name: gv0> > > > > Type: Distributed-Replicate> > > > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559> > > > > Status: Started> > > > > Snapshot Count: 0> > > > > Number of Bricks: 6 x (2 + 1) = 18> > > > > Transport-type: tcp> > > > > Bricks:> > > > > Brick1: 192.168.0.40:/var/bricks/0/brick> > > > > Brick2: 192.168.0.41:/var/bricks/0/brick> > > > > Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter)> > > > > Brick4: 192.168.0.40:/var/bricks/2/brick> > > > > Brick5: 192.168.0.80:/var/bricks/2/brick> > > > > Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter)> > > > > Brick7: 192.168.0.40:/var/bricks/1/brick> > > > > Brick8: 192.168.0.41:/var/bricks/1/brick> > > > > Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter)> > > > > Brick10: 192.168.0.40:/var/bricks/3/brick> > > > > Brick11: 192.168.0.80:/var/bricks/3/brick> > > > > Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter)> > > > > Brick13: 192.168.0.41:/var/bricks/3/brick> > > > > Brick14: 192.168.0.80:/var/bricks/4/brick> > > > > Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter)> > > > > Brick16: 192.168.0.41:/var/bricks/2/brick> > > > > Brick17: 192.168.0.80:/var/bricks/5/brick> > > > > Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter)> > > > > Options Reconfigured:> > > > > cluster.min-free-inodes: 6%> > > > > cluster.min-free-disk: 2%> > > > > performance.md-cache-timeout: 600> > > > > cluster.rebal-throttle: lazy> > > > > features.scrub-freq: monthly> > > > > features.scrub-throttle: lazy> > > > > features.scrub: Inactive> > > > > features.bitrot: off> > > > > cluster.server-quorum-type: none> > > > > performance.cache-refresh-timeout: 10> > > > > performance.cache-max-file-size: 64MB> > > > > performance.cache-size: 781901824> > > > > auth.allow: /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136)> > > > > performance.cache-invalidation: on> > > > > performance.stat-prefetch: on> > > > > features.cache-invalidation-timeout: 600> > > > > cluster.quorum-type: auto> > > > > features.cache-invalidation: on> > > > > nfs.disable: on> > > > > transport.address-family: inet> > > > > cluster.self-heal-daemon: on> > > > > cluster.server-quorum-ratio: 51%> > > > >> > > > > volume status:> > > > >> > > > > Status of volume: gv0> > > > > Gluster process TCP Port RDMA Port Online Pid> > > > > ------------------------------------------------------------------------------> > > > > Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066> > > > > Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082> > > > > Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186> > > > > Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075> > > > > Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325> > > > > Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903> > > > > Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084> > > > > Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104> > > > > Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314> > > > > Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692> > > > > Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269> > > > > Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942> > > > > Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058> > > > > Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433> > > > > Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115> > > > > Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602> > > > > Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522> > > > > Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159> > > > > Self-heal Daemon on localhost N/A N/A Y 26199> > > > > Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635> > > > > Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810> > > > >> > > > > Task Status of Volume gv0> > > > > ------------------------------------------------------------------------------> > > > > There are no active volume tasks> > > > >> > > > > volume heal info summary:> > > > >> > > > > Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs> > > > > Status: Connected> > > > > Total Number of entries: 1006> > > > > Number of entries in heal pending: 1006> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.41:/var/bricks/0/brick> > > > > Status: Connected> > > > > Total Number of entries: 1006> > > > > Number of entries in heal pending: 1006> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.80:/var/bricks/arb_0/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.40:/var/bricks/2/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.80:/var/bricks/2/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.41:/var/bricks/arb_1/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.40:/var/bricks/1/brick> > > > > Status: Connected> > > > > Total Number of entries: 1006> > > > > Number of entries in heal pending: 1006> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.41:/var/bricks/1/brick> > > > > Status: Connected> > > > > Total Number of entries: 1006> > > > > Number of entries in heal pending: 1006> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.80:/var/bricks/arb_1/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.40:/var/bricks/3/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.80:/var/bricks/3/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.41:/var/bricks/arb_0/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.41:/var/bricks/3/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.80:/var/bricks/4/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.40:/var/bricks/arb_0/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.41:/var/bricks/2/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.80:/var/bricks/5/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > Brick 192.168.0.40:/var/bricks/arb_1/brick> > > > > Status: Connected> > > > > Total Number of entries: 0> > > > > Number of entries in heal pending: 0> > > > > Number of entries in split-brain: 0> > > > > Number of entries possibly healing: 0> > > > >> > > > > client-list:> > > > >> > > > > Client connections for volume gv0> > > > > Name count> > > > > ----- ------> > > > > fuse 5> > > > > gfapi.ganesha.nfsd 3> > > > > glustershd 3> > > > >> > > > > total clients for volume gv0 : 11> > > > > -----------------------------------------------------------------> > > > >> > > > > all clients: pro.hostit.de/nextcloud/index.php/s/tWdHox3aqb3qqbG> > > > >> > > > > failing mnt.log pro.hostit.de/nextcloud/index.php/s/2E2NLnXNsTy7EQe> > > > >> > > > > Thank you.> > > > >> > > > > A.> > > > >> > > > > "Strahil Nikolov" hunter86_bg@xxxxxxxxx – 31. Mai 2021 05:23> > > > > > Can you provide gluster volume info , gluster volume status and gluster volume heal info summary and most probably gluster volume status all clients/client-list> > > > > >> > > > > >> > > > > > Best Regards,> > > > > > Strahil Nikolov> > > > > >> > > > > > > On Sun, May 30, 2021 at 15:17, a.schwibbe@xxxxxxx> > > > > > > wrote:> > > > > > >> > > > > > > I am seeking help here after looking for solutions on the web for my distributed-replicated volume.> > > > > > >> > > > > > > My volume is operated since v3.10 and I upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I love it.> > > > > > >> > > > > > > Finally I wanted to extend my 6x2 distributed replicated volume with arbiters for better split-brain protection.> > > > > > >> > > > > > >> > > > > > > So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x (2 +1) and self-heal immideately started. Looking good.> > > > > > >> > > > > > >> > > > > > >> > > > > > > Version: 7.9> > > > > > >> > > > > > >> > > > > > > Number of Bricks: 6 x (2 + 1) = 18> > > > > > >> > > > > > >> > > > > > > cluster.max-op-version: 70200> > > > > > >> > > > > > >> > > > > > > Peers: 3 (node[0..2])> > > > > > >> > > > > > >> > > > > > > Layout> > > > > > >> > > > > > >> > > > > > > |node0 |node1 |node2> > > > > > >> > > > > > > |brick0 |brick0 |arbit0> > > > > > >> > > > > > >> > > > > > > |arbit1 |brick1 |brick1> > > > > > >> > > > > > >> > > > > > > ....> > > > > > >> > > > > > >> > > > > > >> > > > > > > I then recognized that arbiter volumes on node0 & node1 have been healed successfully.> > > > > > >> > > > > > > Unfortunately all arbiter volumes on node2 have not been healed!> > > > > > >> > > > > > > I realized that the main dir on my arb mount point has been added (mount point /var/brick/arb_0 now contains dir "brick") however this dir on _all_ other bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty arb-volumes does contain ".glusterfs", however it has only very few entries. Other than that "brick" is empty.> > > > > > >> > > > > > > At that point I changed brick dir owner with chown to 33:33 and hoped for self-heal to work. It did not.> > > > > > >> > > > > > > I hoped a rebalance fix-layout would fix things. It did not.> > > > > > >> > > > > > > I hoped for a glusterd restart on node2 (as this is happening to both arb volumes on this node exclusively) would help. It did not.> > > > > > >> > > > > > >> > > > > > > Active mount points via nfs-ganesha or fuse continue to work.> > > > > > >> > > > > > > Existing clients cause errors in the arb-brick logs on node2 for missing files or dirs, but clients seem not affected. r/w operations work.> > > > > > >> > > > > > >> > > > > > > New clients are not able to fuse mount the volume for "authentication error".> > > > > > >> > > > > > > heal statistics heal-count show several hundred files need healing, this count is rising. Watching df on the arb-brick mount point on node2 shows every now and then a few bytes written, but then removed immideately after that.> > > > > > >> > > > > > >> > > > > > > Any help/recommendation for you highly appreciated.> > > > > > >> > > > > > > Thank you!> > > > > > >> > > > > > >> > > > > > > A.> > > > > > >> > > > > > > ________> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > Community Meeting Calendar:> > > > > > >> > > > > > >> > > > > > > Schedule -> > > > > > >> > > > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC> > > > > > >> > > > > > > Bridge: meet.google.com/cpu-eiue-hvk> > > > > > >> > > > > > > Gluster-users mailing list> > > > > > >> > > > > > > Gluster-users@xxxxxxxxxxx> > > > > > >> > > > > > > lists.gluster.org/mailman/listinfo/gluster-users> > > > > > >> > > > > > >> > > > >> > > > >> > >> > >> ________>>>> Community Meeting Calendar:>> Schedule -> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC> Bridge: meet.google.com/cpu-eiue-hvk> Gluster-users mailing list> lists.gluster.org/mailman/listinfo/gluster-users>>
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users