Hello all I have a mail store on a volume replica 3 with no arbiter. A while ago the disk of one of the bricks failed and I was several days late to notice it. When I did, I removed that brick from the volume, replaced the failed disk, updated the OS on that machine from el8 to el9 and gluster on all three nodes from 10.3 to 11.1, added back the brick and started a heal. Things appeared to work out OK, but actually they did not. And this is what I have now. # gluster volume info gv0 Volume Name: gv0 Type: Replicate Volume ID: 1e3ca399-8e57-4ee8-997f-f64479199d23 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: zephyrosaurus:/vol/gfs/gv0 Brick2: alvarezsaurus:/vol/gfs/gv0 Brick3: nanosaurus:/vol/gfs/gv0 Options Reconfigured: cluster.entry-self-heal: on cluster.metadata-self-heal: on cluster.data-self-heal: on <snip> On all three hosts: # ls /vol/vmail/net/provocation/oracle/Maildir/cur ls: reading directory '/vol/vmail/net/provocation/oracle/Maildir/cur': Invalid argument That is the glusterfs-mounted inbox of my mail. If I list the bricks instead, I have different results on each host: HostA # ls /vol/gfs/gv0/net/provocation/oracle/Maildir/cur/ |wc -l 4848 HostB # ls /vol/gfs/gv0/net/provocation/oracle/Maildir/cur/ |wc -l 522 HostC # ls /vol/gfs/gv0/net/provocation/oracle/Maildir/cur/ |wc -l 4837 However, # gluster volume heal gv0 info Brick zephyrosaurus:/vol/gfs/gv0 /net/provocation/oracle/Maildir/cur /net/provocation/oracle/Maildir/cur/1701712419.M379665P902306V000000000000002DI8264026770F33CFF_1.zephyrosaurus.nettheatre.org,S=14500:2,RS /net/provocation/oracle/Maildir/cur/1701712390.M212926P902294V000000000000002DIA089A37BF7E58BB4_1.zephyrosaurus.nettheatre.org,S=19286:2,S Status: Connected Number of entries: 3 Brick alvarezsaurus:/vol/gfs/gv0 /net/provocation/oracle/Maildir/cur Status: Connected Number of entries: 1 Brick nanosaurus:/vol/gfs/gv0 /net/provocation/oracle/Maildir/cur Status: Connected Number of entries: 1 That's definitely not what it should be. There are at least 4300+ files missing from hostB and a dozen from hostC which are not queued for healing. And nothing is in split-brain. So I check the attributes of that Maildir/cur directory: HostA # getfattr -d -m . -e hex /vol/gfs/gv0/net/provocation/oracle/Maildir/cur getfattr: Removing leading '/' from absolute path names # file: vol/gfs/gv0/net/provocation/oracle/Maildir/cur trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000002394 trusted.afr.gv0-client-2=0x0000000000000000000000a0 trusted.afr.gv0-client-4=0x00000001000000000000001a trusted.gfid=0xbf3ed8b7b2a8457d88f19482ae1ce73d trusted.glusterfs.dht=0x000000000000000000000000ffffffff trusted.glusterfs.mdata=0x010000000000000000520318006fbb5600000000001c00ba4800000000667c1663000000002e7f0c84a2925c3455bf9e00000000001098f13e HostB # getfattr -d -m . -e hex /vol/gfs/gv0/net/provocation/oracle/Maildir/cur getfattr: Removing leading '/' from absolute path names # file: vol/gfs/gv0/net/provocation/oracle/Maildir/cur trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-2=0x0000000000000000000000a0 trusted.gfid=0xbf3ed8b7b2a8457d88f19482ae1ce73d trusted.glusterfs.dht=0x000000000000000000000000ffffffff trusted.glusterfs.mdata=0x010000000000000000520318006fbb5600000000001c00ba4800000000667c1663000000002e7f0c84a2925c3455bf9e00000000001098f13e HostC # getfattr -d -m . -e hex /vol/gfs/gv0/net/provocation/oracle/Maildir/cur getfattr: Removing leading '/' from absolute path names # file: vol/gfs/gv0/net/provocation/oracle/Maildir/cur trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000002394 trusted.afr.gv0-client-3=0x000000000000000000000000 trusted.afr.gv0-client-4=0x000000010000000000000020 trusted.gfid=0xbf3ed8b7b2a8457d88f19482ae1ce73d trusted.glusterfs.dht=0x000000000000000000000000ffffffff trusted.glusterfs.mdata=0x010000000000000000520318006fbb5600000000001c00ba4800000000667c1663000000002e7f0c84a2925c3455bf9e00000000001098f13e There is the explanation of this mess. In a replica 3 where I should have exactly three clients, I have four clients and none of the bricks have the same set of peers. In a recent thread about similar problems, Ilias used the word "clueless". It applies equally to me too: I have zero clue where to begin or what to do. Any ideas anyone? Cheers, Z -- Слава Україні! Путлер хуйло! ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users